ACAI is a stateless compliance gateway that sits between clients and inference backends. It handles auth, compliance, and observability — inference runs on managed serverless endpoints or customer-registered providers (BYOB).
┌─────────────┐ ┌──────────────────────┐ ┌──────────────────┐
│ Client │────▶│ API Server │────▶│ Managed │
│ (OpenAI SDK)│ │ (FastAPI proxy) │ │ Inference (MaaS)│
└─────────────┘ │ │ └──────────────────┘
│ • Auth (Bearer) │
│ • Rate limiting │────▶┌──────────────────┐
│ • Guardrails / PII │ │ BYOB Providers │
│ • Audit logging │ │ (OpenAI/Claude) │
│ • Semantic cache │ └──────────────────┘
│ • Model routing │
│ • Usage metering │────▶┌──────────────────┐
│ • Billing (Stripe) │ │ Ollama │
└──────────────────────┘ │ (local dev) │
└──────────────────┘| Engine | Modality | Status | Notes |
|---|---|---|---|
| Managed Catalog (Serverless) | Chat, Embeddings, Transcription | Active | Serverless model-as-a-service — zero idle cost, auto-scaling, tier-gated catalog. |
| BYOB (Bring Your Own Backend) | All (provider-dependent) | Active | Customer-registered providers (OpenAI, Anthropic, Azure AI Foundry, etc.). Full compliance layer applied. |
| Ollama | LLMs (local dev) | Dev only | Local development backend via backendUrl override. |
| Component | Service | Purpose |
|---|---|---|
| Inference | Serverless MaaS | Managed model catalog — zero idle cost, auto-scaling |
| Orchestration | AKS | API server pods, web frontend, horizontal autoscaling |
| Audit Storage | Blob Storage | Tamper-proof audit logs, compliance report exports |
| Images | ACR | API server and web frontend container images |
| Autoscaling | HPA | CPU/memory-based horizontal pod autoscaling |
| Observability | OpenTelemetry + Prometheus | Metrics, logs, distributed tracing |
| Secrets | Key Vault | API keys, connection strings, provider secrets |
| Database | PostgreSQL Flexible Server | Users, sessions, API keys, usage records |
Each customer gets their own isolated cloud subscription. This provides complete isolation of billing, networking, identity, and blast radius. Resources are deployed via Infrastructure-as-Code templates.
Every cloud-specific component sits behind an interface. Multi-cloud portability is designed in from day one. No provider-specific assumptions are hardcoded in the application layer.