Rate Limits

Per-tier rate limits protect the platform and ensure fair usage. Limits apply per API key.

Limits by Tier

TierRequests/minTokens/min
Free20 RPM40,000 TPM
Pro ($99/mo)300 RPM500,000 TPM
Business ($499/mo)1,000 RPM5,000,000 TPM
EnterpriseCustom (10K+ RPM)Custom (100M+ TPM)

Rate Limit Algorithm

DirectAI uses a token-bucket algorithm. Each API key has a bucket that fills at the configured rate. Burst capacity allows short spikes above the steady-state rate.

  • Rate: Configurable via DIRECTAI_RATE_LIMIT_RPS (default: 60 req/sec)
  • Burst: Configurable via DIRECTAI_RATE_LIMIT_BURST (default: 120)

Rate Limit Response

When rate limited, the API returns 429 Too Many Requests:

{
  "error": {
    "message": "Rate limit exceeded. Please retry after a brief wait.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Implement exponential backoff in your client. The OpenAI Python and JavaScript SDKs handle retries automatically.

Usage-Based Pricing

ModalityMetricRate
Chat — inputper 1M tokens$1.00
Chat — outputper 1M tokens$2.00
Embeddingsper 1M tokens$0.10
Transcriptionper minute$0.10

Business and Enterprise tiers include volume discounts and BYOB support. See Pricing for full details.

Monitoring Usage

Track your current usage, spend, and remaining quota in the Dashboard → Usage page. Usage data updates in near real-time.