API Reference

Complete endpoint reference for both OpenAI-compatible and ACAI native APIs.

Base URL

https://api.agilecloud.ai

All endpoints require Authorization: Bearer <api_key> unless noted otherwise.

OpenAI-Compatible Endpoints

Drop-in replacement for the OpenAI API. Use any OpenAI SDK by changing only the base URL and API key.

POST

/v1/chat/completions

Generate chat completions. Supports streaming (SSE) and non-streaming.

POST

/v1/embeddings

Generate vector embeddings from text input.

POST

/v1/audio/transcriptions

Transcribe audio files. Multipart form-data.

GET

/v1/models

List all registered models and their aliases.

ACAI Native API

Purpose-built endpoints for model lifecycle, deployment management, and system health. Versioned at /api/v1/.

Models

POST

/api/v1/models

GET

/api/v1/models

List models (filterable by modality, owner)

GET

/api/v1/models/{id}

Get model details

PATCH

/api/v1/models/{id}

Update model metadata

DELETE

/api/v1/models/{id}

Delete a model

Deployments

POST

/api/v1/deployments

Create a deployment for a model version

GET

/api/v1/deployments

List deployments (filterable)

GET

/api/v1/deployments/{id}

Get deployment details

PATCH

/api/v1/deployments/{id}

Update deployment (scaling, config)

DELETE

/api/v1/deployments/{id}

Delete a deployment

System

GET

/api/v1/system/health

Service health snapshot (model registry + backend liveness)

GET

/api/v1/system/capacity

Backend capacity and utilization

GET

/api/v1/system/metrics

Prometheus-format metrics

GET

/api/v1/recommendations

Usage-based optimization recommendations

Prompts

POST

/api/v1/prompts

Create a prompt template

GET

/api/v1/prompts

List prompt templates

GET

/api/v1/prompts/{slug}

Get prompt by slug

PATCH

/api/v1/prompts/{slug}

Update prompt metadata

DELETE

/api/v1/prompts/{slug}

Archive prompt (soft delete)

POST

/api/v1/prompts/{slug}/versions

Create a new draft version

GET

/api/v1/prompts/{slug}/versions

List versions

POST

/api/v1/prompts/{slug}/versions/{ver}/publish

Publish a draft version

POST

/api/v1/prompts/{slug}/render

Render template with variables

POST

/api/v1/prompts/{slug}/ab-test

Create A/B test between versions

GET

/api/v1/prompts/{slug}/ab-results

Get A/B test results

Smart Routing

POST

/api/v1/routes

Create route configuration

GET

/api/v1/routes

List routes

GET

/api/v1/routes/{route_id}

Get route by ID

PATCH

/api/v1/routes/{route_id}

Update route

DELETE

/api/v1/routes/{route_id}

Delete route

POST

/api/v1/routes/evaluate

Dry-run evaluate routing rules

GET

/api/v1/budget/status

Current spend, remaining, forecast

GET

/api/v1/budget/config

Get budget configuration

PATCH

/api/v1/budget/config

Update budget configuration

Batch

POST

/v1/batches

Create batch inference job

GET

/v1/batches

List batch jobs

GET

/v1/batches/{batch_id}

Get batch job status

POST

/v1/batches/{batch_id}/cancel

Cancel batch job

Semantic Cache

GET

/api/v1/cache/stats

Cache hit/miss statistics

GET

/api/v1/cache/config

Current cache configuration

PATCH

/api/v1/cache/config

Update cache configuration

GET

/api/v1/cache/entries

List cached entries

POST

/api/v1/cache/invalidate

Invalidate by model or hash

DELETE

/api/v1/cache/flush

Flush all entries

Guardrail Rules

POST

/api/v1/guardrails/rules

Create custom safety rule

GET

/api/v1/guardrails/rules

List rules

GET

/api/v1/guardrails/rules/{rule_id}

Get rule

PATCH

/api/v1/guardrails/rules/{rule_id}

Update rule

DELETE

/api/v1/guardrails/rules/{rule_id}

Delete rule

POST

/api/v1/guardrails/rules/test

Dry-run rule against sample content

Compliance & Audit

POST

/api/v1/compliance/exports

Create compliance export job (HIPAA, SOC 2)

GET

/api/v1/compliance/exports

List exports

GET

/api/v1/compliance/exports/{export_id}

Get export status

DELETE

/api/v1/compliance/exports/{export_id}

Delete export

GET

/api/v1/audit/retention

Get retention configuration

PATCH

/api/v1/audit/retention

Update retention configuration

GET

/api/v1/audit/retention/report

Retention compliance report

POST

/api/v1/audit/legal-hold

Create legal hold

GET

/api/v1/audit/legal-hold

List legal holds

DELETE

/api/v1/audit/legal-hold/{hold_id}

Release legal hold

RAG (Retrieval-Augmented Generation)

POST

/api/v1/rag/collections

Create collection

GET

/api/v1/rag/collections

List collections

GET

/api/v1/rag/collections/{id}

Get collection

PATCH

/api/v1/rag/collections/{id}

Update collection

DELETE

/api/v1/rag/collections/{id}

Delete collection and documents

POST

/api/v1/rag/collections/{id}/documents

Upload documents

GET

/api/v1/rag/collections/{id}/documents

List documents in collection

GET

/api/v1/rag/documents/{id}

Get document

DELETE

/api/v1/rag/documents/{id}

Delete document

GET

/api/v1/rag/usage

Storage usage and tier cap

POST

/v1/rag/search

Vector / hybrid / keyword search

POST

/v1/rag/query

RAG retrieval + grounded LLM generation

Realtime (WebSocket)

/v1/realtime

OpenAI Realtime API–compatible WebSocket for streaming audio and text

Health Probes

No authentication required.

GET

/healthz

Liveness probe — always returns 200

GET

/readyz

Readiness probe — 200 if models loaded, 503 otherwise

Common Response Patterns

Error Response

{
  "error": {
    "message": "Human-readable description",
    "type": "error_type",
    "code": "error_code"
  }
}

Response Headers

X-Request-ID — Correlation ID (propagated from request or auto-generated)
Content-Type: application/json — Standard responses
Content-Type: text/event-stream — Streaming responses

OpenAPI Spec

Explore the full specification interactively with schemas, parameters, and example responses:

Open Interactive Docs Download OpenAPI JSON

The spec is also available at /openapi.json on the API server and in the repository at docs/openapi.json.