Architecture
DirectAI is a stateless compliance gateway that sits between clients and inference backends. It handles auth, compliance, and observability — inference runs on managed serverless endpoints or customer-registered providers (BYOB).
High-Level Overview
┌─────────────┐ ┌──────────────────────┐ ┌──────────────────┐
│ Client │────▶│ API Server │────▶│ Managed │
│ (OpenAI SDK)│ │ (FastAPI proxy) │ │ Inference (MaaS)│
└─────────────┘ │ │ └──────────────────┘
│ • Auth (Bearer) │
│ • Rate limiting │────▶┌──────────────────┐
│ • Guardrails / PII │ │ BYOB Providers │
│ • Audit logging │ │ (OpenAI/Claude) │
│ • Semantic cache │ └──────────────────┘
│ • Model routing │
│ • Usage metering │────▶┌──────────────────┐
│ • Billing (Stripe) │ │ Ollama │
└──────────────────────┘ │ (local dev) │
└──────────────────┘Request Flow
- Client sends request to API server with Bearer token
- Auth middleware validates API key (SHA-256 lookup with TTL cache)
- Rate limiter checks token-bucket quota for the key
- Correlation ID assigned (or propagated from X-Request-ID header)
- Guardrails evaluate request (content safety, PII, injection)
- Model registry resolves model name/alias to backend service
- Request proxied to inference backend via httpx HTTP/2
- Response streamed back through guardrails check
- Usage metered (tokens counted, events queued for Stripe)
- Audit event written
Inference Engines
| Engine | Modality | Status | Notes |
|---|---|---|---|
| Managed Catalog (Serverless) | Chat, Embeddings, Transcription | Active | Serverless model-as-a-service — zero idle cost, auto-scaling, tier-gated catalog. |
| BYOB (Bring Your Own Backend) | All (provider-dependent) | Active | Customer-registered providers (OpenAI, Anthropic, Azure AI Foundry, etc.). Full compliance layer applied. |
| Ollama | LLMs (local dev) | Dev only | Local development backend via backendUrl override. |
Infrastructure
| Component | Service | Purpose |
|---|---|---|
| Inference | Serverless MaaS | Managed model catalog — zero idle cost, auto-scaling |
| Orchestration | AKS | API server pods, web frontend, horizontal autoscaling |
| Audit Storage | Blob Storage | Tamper-proof audit logs, compliance report exports |
| Images | ACR | API server and web frontend container images |
| Autoscaling | HPA | CPU/memory-based horizontal pod autoscaling |
| Observability | OpenTelemetry + Prometheus | Metrics, logs, distributed tracing |
| Secrets | Key Vault | API keys, connection strings, provider secrets |
| Database | PostgreSQL Flexible Server | Users, sessions, API keys, usage records |
Customer Isolation
Each customer gets their own isolated cloud subscription. This provides complete isolation of billing, networking, identity, and blast radius. Resources are deployed via Infrastructure-as-Code templates.
- Separate VNet per customer stamp
- Dedicated Key Vault with RBAC authorization
- Per-stamp Log Analytics workspace
- Two managed identities per stamp (control plane + kubelet) with least-privilege RBAC
Multi-Cloud Design
Every cloud-specific component sits behind an interface. Multi-cloud portability is designed in from day one. No provider-specific assumptions are hardcoded in the application layer.