Rate Limits
Per-tier rate limits protect the platform and ensure fair usage. Limits apply per API key.
Limits by Tier
| Tier | Requests/min | Tokens/min |
|---|---|---|
| Free | 20 RPM | 40,000 TPM |
| Pro ($99/mo) | 300 RPM | 500,000 TPM |
| Business ($499/mo) | 1,000 RPM | 5,000,000 TPM |
| Enterprise | Custom (10K+ RPM) | Custom (100M+ TPM) |
Rate Limit Algorithm
DirectAI uses a token-bucket algorithm. Each API key has a bucket that fills at the configured rate. Burst capacity allows short spikes above the steady-state rate.
- Rate: Configurable via
DIRECTAI_RATE_LIMIT_RPS(default: 60 req/sec) - Burst: Configurable via
DIRECTAI_RATE_LIMIT_BURST(default: 120)
Rate Limit Response
When rate limited, the API returns 429 Too Many Requests:
{
"error": {
"message": "Rate limit exceeded. Please retry after a brief wait.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}Implement exponential backoff in your client. The OpenAI Python and JavaScript SDKs handle retries automatically.
Usage-Based Pricing
| Modality | Metric | Rate |
|---|---|---|
| Chat — input | per 1M tokens | $1.00 |
| Chat — output | per 1M tokens | $2.00 |
| Embeddings | per 1M tokens | $0.10 |
| Transcription | per minute | $0.10 |
Business and Enterprise tiers include volume discounts and BYOB support. See Pricing for full details.
Monitoring Usage
Track your current usage, spend, and remaining quota in the Dashboard → Usage page. Usage data updates in near real-time.