Text to Speech

Generate spoken audio from text using TTS and TTS-HD models. OpenAI-compatible endpoint.

Endpoint

POST https://api.agilecloud.ai/v1/audio/speech

Request Parameters

Send as application/json:

Parameter	Type	Required	Description
model	string	No	`tts` or `tts-hd`. Default: `tts`
input	string	Yes	Text to synthesize (max 4096 characters)
voice	string	No	`alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`. Default: `alloy`
response_format	string	No	`mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`. Default: `mp3`
speed	number	No	Speed multiplier (0.25–4.0). Default: 1.0

cURL Example

curl https://api.agilecloud.ai/v1/audio/speech \
  -H "Authorization: Bearer $DIRECTAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts",
    "input": "ACAI gives you AI inference with built-in compliance.",
    "voice": "alloy"
  }' \
  --output speech.mp3

Python SDK Example

from openai import OpenAI
from pathlib import Path

client = OpenAI(
    base_url="https://api.agilecloud.ai/v1",
    api_key="YOUR_API_KEY",
)

response = client.audio.speech.create(
    model="tts",
    voice="alloy",
    input="ACAI gives you AI inference with built-in compliance.",
)

Path("speech.mp3").write_bytes(response.content)

Models

Model	Quality	Tiers
tts	Standard — fast, low latency	Pro+
tts-hd	High definition — higher quality, slightly slower	Pro+

Notes

Response is streamed as raw audio bytes — pipe to a file or audio player.
All 6 voices are available on both tts and tts-hd.
Maximum input is 4,096 characters per request.
Compliance guardrails (content safety, audit logging) apply to the input text.