Audio Transcription

Transcribe audio files to text using Whisper. OpenAI-compatible multipart endpoint.

Endpoint

POST https://api.agilecloud.ai/v1/audio/transcriptions

Request Parameters

Send as multipart/form-data:

Parameter	Type	Required	Description
file	file	Yes	Audio file (mp3, mp4, mpeg, mpga, m4a, wav, webm)
model	string	Yes	Model name (e.g., `whisper-large-v3`)
language	string	No	ISO-639-1 language code (e.g., `en`)
response_format	string	No	`json`, `text`, `verbose_json`. Default: `json`

Example

curl https://api.agilecloud.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $DIRECTAI_API_KEY" \
  -F file=@recording.mp3 \
  -F model=whisper-large-v3

Response

{
  "text": "Hello, this is a sample transcription from ACAI."
}

Python SDK Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.agilecloud.ai/v1",
    api_key="YOUR_API_KEY",
)

with open("recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-large-v3",
        file=audio_file,
    )

print(transcript.text)

Supported Formats

MP3, MP4, MPEG, MPGA, M4A, WAV, WEBM
Maximum file size: 25 MB
Whisper large-v3 supports 99+ languages

Billing

Transcription is billed per minute of audio at $0.10/minute. See Pricing for details.