Rate Limits

Per-tier rate limits protect the platform and ensure fair usage. Limits apply per API key.

Limits by Tier

Tier	Requests/min	Tokens/min
Free	20 RPM	40,000 TPM
Pro ($99/mo)	300 RPM	500,000 TPM
Business ($499/mo)	1,000 RPM	5,000,000 TPM
Enterprise	Custom (10K+ RPM)	Custom (100M+ TPM)

Rate Limit Algorithm

ACAI uses a token-bucket algorithm. Each API key has a bucket that fills at the configured rate. Burst capacity allows short spikes above the steady-state rate.

Rate: Configurable via DIRECTAI_RATE_LIMIT_RPS (default: 60 req/sec)
Burst: Configurable via DIRECTAI_RATE_LIMIT_BURST (default: 120)

Rate Limit Response

When rate limited, the API returns 429 Too Many Requests:

{
  "error": {
    "message": "Rate limit exceeded. Please retry after a brief wait.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Implement exponential backoff in your client. The OpenAI Python and JavaScript SDKs handle retries automatically.

Usage-Based Pricing

Modality	Metric	Rate
Chat — input	per 1M tokens	$1.00
Chat — output	per 1M tokens	$2.00
Embeddings	per 1M tokens	$0.10
Transcription	per minute	$0.10

Business and Enterprise tiers include volume discounts and BYOB support. See Pricing for full details.

Monitoring Usage

Track your current usage, spend, and remaining quota in the Dashboard → Usage page. Usage data updates in near real-time.