Audio Transcription
Transcribe audio files to text using Whisper. OpenAI-compatible multipart endpoint.
Endpoint
POST https://api.agilecloud.ai/v1/audio/transcriptions
Request Parameters
Send as multipart/form-data:
| Parameter | Type | Required | Description |
|---|---|---|---|
| file | file | Yes | Audio file (mp3, mp4, mpeg, mpga, m4a, wav, webm) |
| model | string | Yes | Model name (e.g., whisper-large-v3) |
| language | string | No | ISO-639-1 language code (e.g., en) |
| response_format | string | No | json, text, verbose_json. Default: json |
Example
curl https://api.agilecloud.ai/v1/audio/transcriptions \ -H "Authorization: Bearer $DIRECTAI_API_KEY" \ -F file=@recording.mp3 \ -F model=whisper-large-v3
Response
{
"text": "Hello, this is a sample transcription from DirectAI."
}Python SDK Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.agilecloud.ai/v1",
api_key="YOUR_API_KEY",
)
with open("recording.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
)
print(transcript.text)Supported Formats
- MP3, MP4, MPEG, MPGA, M4A, WAV, WEBM
- Maximum file size: 25 MB
- Whisper large-v3 supports 99+ languages
Billing
Transcription is billed per minute of audio at $0.10/minute. See Pricing for details.