Audio Transcription

Transcribe audio files to text using Whisper. OpenAI-compatible multipart endpoint.

Endpoint

POST https://api.agilecloud.ai/v1/audio/transcriptions

Request Parameters

Send as multipart/form-data:

ParameterTypeRequiredDescription
filefileYesAudio file (mp3, mp4, mpeg, mpga, m4a, wav, webm)
modelstringYesModel name (e.g., whisper-large-v3)
languagestringNoISO-639-1 language code (e.g., en)
response_formatstringNojson, text, verbose_json. Default: json

Example

curl https://api.agilecloud.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $DIRECTAI_API_KEY" \
  -F file=@recording.mp3 \
  -F model=whisper-large-v3

Response

{
  "text": "Hello, this is a sample transcription from DirectAI."
}

Python SDK Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.agilecloud.ai/v1",
    api_key="YOUR_API_KEY",
)

with open("recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-large-v3",
        file=audio_file,
    )

print(transcript.text)

Supported Formats

  • MP3, MP4, MPEG, MPGA, M4A, WAV, WEBM
  • Maximum file size: 25 MB
  • Whisper large-v3 supports 99+ languages

Billing

Transcription is billed per minute of audio at $0.10/minute. See Pricing for details.