Realtime API

OpenAI Realtime API–compatible WebSocket endpoint for low-latency streaming audio and text interactions. Supports bidirectional audio, text messages, and tool calls over a single connection.

Connecting

Open a WebSocket connection to the realtime endpoint. Authenticate via query parameter or the Sec-WebSocket-Protocol header.

# Query parameter auth
wss://api.agilecloud.ai/v1/realtime?api_key=YOUR_API_KEY

# Header auth
Sec-WebSocket-Protocol: bearer.YOUR_API_KEY

Session Lifecycle

Connect → session.created (server)
  → session.update (client — set model, tools, params)
  → input_audio_buffer.append / conversation.item.create
  → response.create
  ← response.audio.delta / response.text.delta (streaming)
  ← response.done
  → Close

JavaScript Example

const ws = new WebSocket(
  "wss://api.agilecloud.ai/v1/realtime?api_key=YOUR_API_KEY"
);

ws.onopen = () => {
  // Configure session
  ws.send(JSON.stringify({
    type: "session.update",
    session: {
      model: "qwen-2.5-3b",
      modalities: ["text"],
      instructions: "You are a helpful assistant.",
    },
  }));

  // Send a message
  ws.send(JSON.stringify({
    type: "conversation.item.create",
    item: {
      type: "message",
      role: "user",
      content: [{ type: "input_text", text: "Hello!" }],
    },
  }));

  // Request a response
  ws.send(JSON.stringify({ type: "response.create" }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === "response.text.delta") {
    process.stdout.write(data.delta);
  }
};

Event Types

DirectionEventDescription
Serversession.createdConnection established, session ID assigned
Clientsession.updateConfigure model, tools, modalities
Clientconversation.item.createAdd a message to the conversation
Clientinput_audio_buffer.appendStream audio input (base64 PCM16)
Clientinput_audio_buffer.commitFinalize audio buffer
Clientresponse.createRequest a model response
Serverresponse.text.deltaStreaming text chunk
Serverresponse.audio.deltaStreaming audio chunk (base64)
Serverresponse.function_call_arguments.deltaTool call argument delta
Serverresponse.doneResponse complete (includes usage)
ServererrorError event

Audio Format

Audio is streamed as base64-encoded PCM16 at 24kHz mono. Input audio is buffered on the server and transcribed when committed. Output audio is streamed as deltas.

Rate Limiting

WebSocket connections are rate-limited per API key. Connection limits and message throughput vary by tier. Exceeding limits results in a close frame with code 1008 (Policy Violation).

Billing

Realtime sessions are billed per token (text) and per second (audio), at the same rates as the corresponding REST endpoints. Usage is reported in the response.done event.