Realtime API
OpenAI Realtime API–compatible WebSocket endpoint for low-latency streaming audio and text interactions. Supports bidirectional audio, text messages, and tool calls over a single connection.
Connecting
Open a WebSocket connection to the realtime endpoint. Authenticate via query parameter or the Sec-WebSocket-Protocol header.
# Query parameter auth wss://api.agilecloud.ai/v1/realtime?api_key=YOUR_API_KEY # Header auth Sec-WebSocket-Protocol: bearer.YOUR_API_KEY
Session Lifecycle
Connect → session.created (server) → session.update (client — set model, tools, params) → input_audio_buffer.append / conversation.item.create → response.create ← response.audio.delta / response.text.delta (streaming) ← response.done → Close
JavaScript Example
const ws = new WebSocket(
"wss://api.agilecloud.ai/v1/realtime?api_key=YOUR_API_KEY"
);
ws.onopen = () => {
// Configure session
ws.send(JSON.stringify({
type: "session.update",
session: {
model: "qwen-2.5-3b",
modalities: ["text"],
instructions: "You are a helpful assistant.",
},
}));
// Send a message
ws.send(JSON.stringify({
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [{ type: "input_text", text: "Hello!" }],
},
}));
// Request a response
ws.send(JSON.stringify({ type: "response.create" }));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === "response.text.delta") {
process.stdout.write(data.delta);
}
};Event Types
| Direction | Event | Description |
|---|---|---|
| Server | session.created | Connection established, session ID assigned |
| Client | session.update | Configure model, tools, modalities |
| Client | conversation.item.create | Add a message to the conversation |
| Client | input_audio_buffer.append | Stream audio input (base64 PCM16) |
| Client | input_audio_buffer.commit | Finalize audio buffer |
| Client | response.create | Request a model response |
| Server | response.text.delta | Streaming text chunk |
| Server | response.audio.delta | Streaming audio chunk (base64) |
| Server | response.function_call_arguments.delta | Tool call argument delta |
| Server | response.done | Response complete (includes usage) |
| Server | error | Error event |
Audio Format
Audio is streamed as base64-encoded PCM16 at 24kHz mono. Input audio is buffered on the server and transcribed when committed. Output audio is streamed as deltas.
Rate Limiting
WebSocket connections are rate-limited per API key. Connection limits and message throughput vary by tier. Exceeding limits results in a close frame with code 1008 (Policy Violation).
Billing
Realtime sessions are billed per token (text) and per second (audio), at the same rates as the corresponding REST endpoints. Usage is reported in the response.done event.