API Reference

Complete endpoint reference for both OpenAI-compatible and DirectAI native APIs.

Base URL

https://api.agilecloud.ai

All endpoints require Authorization: Bearer <api_key> unless noted otherwise.

OpenAI-Compatible Endpoints

Drop-in replacement for the OpenAI API. Use any OpenAI SDK by changing only the base URL and API key.

DirectAI Native API

Purpose-built endpoints for model lifecycle, deployment management, and system health. Versioned at /api/v1/.

Models

POST
/api/v1/models

Register a new model version

GET
/api/v1/models

List models (filterable by modality, owner)

GET
/api/v1/models/{id}

Get model details

PATCH
/api/v1/models/{id}

Update model metadata

DELETE
/api/v1/models/{id}

Delete a model

Deployments

POST
/api/v1/deployments

Create a deployment for a model version

GET
/api/v1/deployments

List deployments (filterable)

GET
/api/v1/deployments/{id}

Get deployment details

PATCH
/api/v1/deployments/{id}

Update deployment (scaling, config)

DELETE
/api/v1/deployments/{id}

Delete a deployment

System

GET
/api/v1/system/health

Service health snapshot (model registry + backend liveness)

GET
/api/v1/system/capacity

Backend capacity and utilization

GET
/api/v1/system/metrics

Prometheus-format metrics

GET
/api/v1/recommendations

Usage-based optimization recommendations

Prompts

POST
/api/v1/prompts

Create a prompt template

GET
/api/v1/prompts

List prompt templates

GET
/api/v1/prompts/{slug}

Get prompt by slug

PATCH
/api/v1/prompts/{slug}

Update prompt metadata

DELETE
/api/v1/prompts/{slug}

Archive prompt (soft delete)

POST
/api/v1/prompts/{slug}/versions

Create a new draft version

GET
/api/v1/prompts/{slug}/versions

List versions

POST
/api/v1/prompts/{slug}/versions/{ver}/publish

Publish a draft version

POST
/api/v1/prompts/{slug}/render

Render template with variables

POST
/api/v1/prompts/{slug}/ab-test

Create A/B test between versions

GET
/api/v1/prompts/{slug}/ab-results

Get A/B test results

Smart Routing

POST
/api/v1/routes

Create route configuration

GET
/api/v1/routes

List routes

GET
/api/v1/routes/{route_id}

Get route by ID

PATCH
/api/v1/routes/{route_id}

Update route

DELETE
/api/v1/routes/{route_id}

Delete route

POST
/api/v1/routes/evaluate

Dry-run evaluate routing rules

GET
/api/v1/budget/status

Current spend, remaining, forecast

GET
/api/v1/budget/config

Get budget configuration

PATCH
/api/v1/budget/config

Update budget configuration

Batch

POST
/v1/batches

Create batch inference job

GET
/v1/batches

List batch jobs

GET
/v1/batches/{batch_id}

Get batch job status

POST
/v1/batches/{batch_id}/cancel

Cancel batch job

Semantic Cache

GET
/api/v1/cache/stats

Cache hit/miss statistics

GET
/api/v1/cache/config

Current cache configuration

PATCH
/api/v1/cache/config

Update cache configuration

GET
/api/v1/cache/entries

List cached entries

POST
/api/v1/cache/invalidate

Invalidate by model or hash

DELETE
/api/v1/cache/flush

Flush all entries

Guardrail Rules

POST
/api/v1/guardrails/rules

Create custom safety rule

GET
/api/v1/guardrails/rules

List rules

GET
/api/v1/guardrails/rules/{rule_id}

Get rule

PATCH
/api/v1/guardrails/rules/{rule_id}

Update rule

DELETE
/api/v1/guardrails/rules/{rule_id}

Delete rule

POST
/api/v1/guardrails/rules/test

Dry-run rule against sample content

Compliance & Audit

POST
/api/v1/compliance/exports

Create compliance export job (HIPAA, SOC 2)

GET
/api/v1/compliance/exports

List exports

GET
/api/v1/compliance/exports/{export_id}

Get export status

DELETE
/api/v1/compliance/exports/{export_id}

Delete export

GET
/api/v1/audit/retention

Get retention configuration

PATCH
/api/v1/audit/retention

Update retention configuration

GET
/api/v1/audit/retention/report

Retention compliance report

POST
/api/v1/audit/legal-hold

Create legal hold

GET
/api/v1/audit/legal-hold

List legal holds

DELETE
/api/v1/audit/legal-hold/{hold_id}

Release legal hold

RAG (Retrieval-Augmented Generation)

POST
/api/v1/rag/collections

Create collection

GET
/api/v1/rag/collections

List collections

GET
/api/v1/rag/collections/{id}

Get collection

PATCH
/api/v1/rag/collections/{id}

Update collection

DELETE
/api/v1/rag/collections/{id}

Delete collection and documents

POST
/api/v1/rag/collections/{id}/documents

Upload documents

GET
/api/v1/rag/collections/{id}/documents

List documents in collection

GET
/api/v1/rag/documents/{id}

Get document

DELETE
/api/v1/rag/documents/{id}

Delete document

GET
/api/v1/rag/usage

Storage usage and tier cap

POST
/v1/rag/search

Vector / hybrid / keyword search

POST
/v1/rag/query

RAG retrieval + grounded LLM generation

Realtime (WebSocket)

Health Probes

No authentication required.

GET
/healthz

Liveness probe — always returns 200

GET
/readyz

Readiness probe — 200 if models loaded, 503 otherwise

Common Response Patterns

Error Response

{
  "error": {
    "message": "Human-readable description",
    "type": "error_type",
    "code": "error_code"
  }
}

Response Headers

  • X-Request-ID — Correlation ID (propagated from request or auto-generated)
  • Content-Type: application/json — Standard responses
  • Content-Type: text/event-stream — Streaming responses

OpenAPI Spec

The full OpenAPI specification is available at /docs (Swagger UI) and /openapi.json on the API server. You can also find the exported spec in the repository at docs/openapi.json.