API Reference
Complete endpoint reference for both OpenAI-compatible and DirectAI native APIs.
Base URL
https://api.agilecloud.ai
All endpoints require Authorization: Bearer <api_key> unless noted otherwise.
OpenAI-Compatible Endpoints
Drop-in replacement for the OpenAI API. Use any OpenAI SDK by changing only the base URL and API key.
DirectAI Native API
Purpose-built endpoints for model lifecycle, deployment management, and system health. Versioned at /api/v1/.
Models
/api/v1/modelsRegister a new model version
/api/v1/modelsList models (filterable by modality, owner)
/api/v1/models/{id}Get model details
/api/v1/models/{id}Update model metadata
/api/v1/models/{id}Delete a model
Deployments
/api/v1/deploymentsCreate a deployment for a model version
/api/v1/deploymentsList deployments (filterable)
/api/v1/deployments/{id}Get deployment details
/api/v1/deployments/{id}Update deployment (scaling, config)
/api/v1/deployments/{id}Delete a deployment
System
/api/v1/system/healthService health snapshot (model registry + backend liveness)
/api/v1/system/capacityBackend capacity and utilization
/api/v1/system/metricsPrometheus-format metrics
/api/v1/recommendationsUsage-based optimization recommendations
Prompts
/api/v1/promptsCreate a prompt template
/api/v1/promptsList prompt templates
/api/v1/prompts/{slug}Get prompt by slug
/api/v1/prompts/{slug}Update prompt metadata
/api/v1/prompts/{slug}Archive prompt (soft delete)
/api/v1/prompts/{slug}/versionsCreate a new draft version
/api/v1/prompts/{slug}/versionsList versions
/api/v1/prompts/{slug}/versions/{ver}/publishPublish a draft version
/api/v1/prompts/{slug}/renderRender template with variables
/api/v1/prompts/{slug}/ab-testCreate A/B test between versions
/api/v1/prompts/{slug}/ab-resultsGet A/B test results
Smart Routing
/api/v1/routesCreate route configuration
/api/v1/routesList routes
/api/v1/routes/{route_id}Get route by ID
/api/v1/routes/{route_id}Update route
/api/v1/routes/{route_id}Delete route
/api/v1/routes/evaluateDry-run evaluate routing rules
/api/v1/budget/statusCurrent spend, remaining, forecast
/api/v1/budget/configGet budget configuration
/api/v1/budget/configUpdate budget configuration
Batch
/v1/batchesCreate batch inference job
/v1/batchesList batch jobs
/v1/batches/{batch_id}Get batch job status
/v1/batches/{batch_id}/cancelCancel batch job
Semantic Cache
/api/v1/cache/statsCache hit/miss statistics
/api/v1/cache/configCurrent cache configuration
/api/v1/cache/configUpdate cache configuration
/api/v1/cache/entriesList cached entries
/api/v1/cache/invalidateInvalidate by model or hash
/api/v1/cache/flushFlush all entries
Guardrail Rules
/api/v1/guardrails/rulesCreate custom safety rule
/api/v1/guardrails/rulesList rules
/api/v1/guardrails/rules/{rule_id}Get rule
/api/v1/guardrails/rules/{rule_id}Update rule
/api/v1/guardrails/rules/{rule_id}Delete rule
/api/v1/guardrails/rules/testDry-run rule against sample content
Compliance & Audit
/api/v1/compliance/exportsCreate compliance export job (HIPAA, SOC 2)
/api/v1/compliance/exportsList exports
/api/v1/compliance/exports/{export_id}Get export status
/api/v1/compliance/exports/{export_id}Delete export
/api/v1/audit/retentionGet retention configuration
/api/v1/audit/retentionUpdate retention configuration
/api/v1/audit/retention/reportRetention compliance report
/api/v1/audit/legal-holdCreate legal hold
/api/v1/audit/legal-holdList legal holds
/api/v1/audit/legal-hold/{hold_id}Release legal hold
RAG (Retrieval-Augmented Generation)
/api/v1/rag/collectionsCreate collection
/api/v1/rag/collectionsList collections
/api/v1/rag/collections/{id}Get collection
/api/v1/rag/collections/{id}Update collection
/api/v1/rag/collections/{id}Delete collection and documents
/api/v1/rag/collections/{id}/documentsUpload documents
/api/v1/rag/collections/{id}/documentsList documents in collection
/api/v1/rag/documents/{id}Get document
/api/v1/rag/documents/{id}Delete document
/api/v1/rag/usageStorage usage and tier cap
/v1/rag/searchVector / hybrid / keyword search
/v1/rag/queryRAG retrieval + grounded LLM generation
Realtime (WebSocket)
Health Probes
No authentication required.
/healthzLiveness probe — always returns 200
/readyzReadiness probe — 200 if models loaded, 503 otherwise
Common Response Patterns
Error Response
{
"error": {
"message": "Human-readable description",
"type": "error_type",
"code": "error_code"
}
}Response Headers
X-Request-ID— Correlation ID (propagated from request or auto-generated)Content-Type: application/json— Standard responsesContent-Type: text/event-stream— Streaming responses
OpenAPI Spec
The full OpenAPI specification is available at /docs (Swagger UI) and /openapi.json on the API server. You can also find the exported spec in the repository at docs/openapi.json.