Realtime API

OpenAI Realtime API–compatible WebSocket endpoint for low-latency streaming audio and text interactions. Supports bidirectional audio, text messages, and tool calls over a single connection.

Connecting

Open a WebSocket connection to the realtime endpoint. Authenticate via query parameter or the Sec-WebSocket-Protocol header.

# Query parameter auth
wss://api.agilecloud.ai/v1/realtime?api_key=YOUR_API_KEY

# Header auth
Sec-WebSocket-Protocol: bearer.YOUR_API_KEY

Session Lifecycle

Connect → session.created (server)
  → session.update (client — set model, tools, params)
  → input_audio_buffer.append / conversation.item.create
  → response.create
  ← response.audio.delta / response.text.delta (streaming)
  ← response.done
  → Close

JavaScript Example

const ws = new WebSocket(
  "wss://api.agilecloud.ai/v1/realtime?api_key=YOUR_API_KEY"
);

ws.onopen = () => {
  // Configure session
  ws.send(JSON.stringify({
    type: "session.update",
    session: {
      model: "gpt-4o-mini",
      modalities: ["text"],
      instructions: "You are a helpful assistant.",
    },
  }));

  // Send a message
  ws.send(JSON.stringify({
    type: "conversation.item.create",
    item: {
      type: "message",
      role: "user",
      content: [{ type: "input_text", text: "Hello!" }],
    },
  }));

  // Request a response
  ws.send(JSON.stringify({ type: "response.create" }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === "response.text.delta") {
    process.stdout.write(data.delta);
  }
};

Event Types

Direction	Event	Description
Server	session.created	Connection established, session ID assigned
Client	session.update	Configure model, tools, modalities
Client	conversation.item.create	Add a message to the conversation
Client	input_audio_buffer.append	Stream audio input (base64 PCM16)
Client	input_audio_buffer.commit	Finalize audio buffer
Client	response.create	Request a model response
Server	response.text.delta	Streaming text chunk
Server	response.audio.delta	Streaming audio chunk (base64)
Server	response.function_call_arguments.delta	Tool call argument delta
Server	response.done	Response complete (includes usage)
Server	error	Error event

Audio Format

Audio is streamed as base64-encoded PCM16 at 24kHz mono. Input audio is buffered on the server and transcribed when committed. Output audio is streamed as deltas.

Rate Limiting

WebSocket connections are rate-limited per API key. Connection limits and message throughput vary by tier. Exceeding limits results in a close frame with code 1008 (Policy Violation).

Billing

Realtime sessions are billed per token (text) and per second (audio), at the same rates as the corresponding REST endpoints. Usage is reported in the response.done event.