Architecture

DirectAI is a stateless compliance gateway that sits between clients and inference backends. It handles auth, compliance, and observability — inference runs on managed serverless endpoints or customer-registered providers (BYOB).

High-Level Overview

┌─────────────┐     ┌──────────────────────┐     ┌──────────────────┐
│   Client     │────▶│   API Server         │────▶│  Managed         │
│  (OpenAI SDK)│     │  (FastAPI proxy)      │     │  Inference (MaaS)│
└─────────────┘     │                      │     └──────────────────┘
                    │  • Auth (Bearer)      │
                    │  • Rate limiting      │────▶┌──────────────────┐
                    │  • Guardrails / PII   │     │  BYOB Providers  │
                    │  • Audit logging      │     │  (OpenAI/Claude) │
                    │  • Semantic cache     │     └──────────────────┘
                    │  • Model routing      │
                    │  • Usage metering     │────▶┌──────────────────┐
                    │  • Billing (Stripe)   │     │  Ollama          │
                    └──────────────────────┘     │  (local dev)     │
                                                 └──────────────────┘

Request Flow

  1. Client sends request to API server with Bearer token
  2. Auth middleware validates API key (SHA-256 lookup with TTL cache)
  3. Rate limiter checks token-bucket quota for the key
  4. Correlation ID assigned (or propagated from X-Request-ID header)
  5. Guardrails evaluate request (content safety, PII, injection)
  6. Model registry resolves model name/alias to backend service
  7. Request proxied to inference backend via httpx HTTP/2
  8. Response streamed back through guardrails check
  9. Usage metered (tokens counted, events queued for Stripe)
  10. Audit event written

Inference Engines

EngineModalityStatusNotes
Managed Catalog (Serverless)Chat, Embeddings, TranscriptionActiveServerless model-as-a-service — zero idle cost, auto-scaling, tier-gated catalog.
BYOB (Bring Your Own Backend)All (provider-dependent)ActiveCustomer-registered providers (OpenAI, Anthropic, Azure AI Foundry, etc.). Full compliance layer applied.
OllamaLLMs (local dev)Dev onlyLocal development backend via backendUrl override.

Infrastructure

ComponentServicePurpose
InferenceServerless MaaSManaged model catalog — zero idle cost, auto-scaling
OrchestrationAKSAPI server pods, web frontend, horizontal autoscaling
Audit StorageBlob StorageTamper-proof audit logs, compliance report exports
ImagesACRAPI server and web frontend container images
AutoscalingHPACPU/memory-based horizontal pod autoscaling
ObservabilityOpenTelemetry + PrometheusMetrics, logs, distributed tracing
SecretsKey VaultAPI keys, connection strings, provider secrets
DatabasePostgreSQL Flexible ServerUsers, sessions, API keys, usage records

Customer Isolation

Each customer gets their own isolated cloud subscription. This provides complete isolation of billing, networking, identity, and blast radius. Resources are deployed via Infrastructure-as-Code templates.

  • Separate VNet per customer stamp
  • Dedicated Key Vault with RBAC authorization
  • Per-stamp Log Analytics workspace
  • Two managed identities per stamp (control plane + kubelet) with least-privilege RBAC

Multi-Cloud Design

Every cloud-specific component sits behind an interface. Multi-cloud portability is designed in from day one. No provider-specific assumptions are hardcoded in the application layer.