Smart Routing
Route requests across models and providers with A/B testing, budget controls, and automatic fallback chains. Optimize for cost, latency, or quality without changing client code.
How It Works
Route configurations define rules for directing traffic. When a request arrives, the routing engine evaluates rules in priority order to select the target model and provider. Routes can split traffic (A/B), enforce budgets, and fall back to alternates on failure.
Request → Routing Engine ├── A/B Test match? → Split to variant A or B ├── Budget exceeded? → Route to cheaper fallback ├── Primary healthy? → Route to primary └── Primary down? → Fallback chain
Create a Route
curl https://api.agilecloud.ai/api/v1/routes \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "production-chat",
"match_model": "default-chat",
"strategy": "fallback",
"targets": [
{"model": "qwen-2.5-3b", "weight": 1.0},
{"model": "gpt-4o-mini", "weight": 1.0, "fallback": true}
]
}'A/B Testing
Split traffic between models to compare latency, quality, and cost. Results are tracked per-variant with statistical significance.
curl https://api.agilecloud.ai/api/v1/routes \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "model-comparison",
"match_model": "chat-model",
"strategy": "ab_test",
"targets": [
{"model": "qwen-2.5-3b", "weight": 0.5},
{"model": "gpt-4o-mini", "weight": 0.5}
]
}'Budget Controls
Set spending limits to prevent runaway costs. When the budget is exhausted, requests are routed to a cheaper model or rejected.
# Get budget status
curl https://api.agilecloud.ai/api/v1/budget/status \
-H "Authorization: Bearer YOUR_API_KEY"
# Response
{
"budget_usd": 500.00,
"spent_usd": 142.30,
"remaining_usd": 357.70,
"forecast_end_of_month_usd": 284.60,
"period": "2025-01"
}
# Update budget config
curl -X PATCH https://api.agilecloud.ai/api/v1/budget/config \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"monthly_budget_usd": 500.00,
"alert_threshold_pct": 80,
"overage_action": "downgrade_model"
}'Fallback Chains
When a primary backend is unhealthy or overloaded, the routing engine automatically falls through to the next target in the chain. Health is monitored continuously — when the primary recovers, traffic returns automatically.
Dry-Run Evaluation
Test routing rules without sending actual requests:
curl https://api.agilecloud.ai/api/v1/routes/evaluate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "default-chat",
"customer_id": "cust_abc"
}'
# Response shows which route and target would be selectedEndpoints
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/routes | Create route |
| GET | /api/v1/routes | List routes |
| GET | /api/v1/routes/{route_id} | Get route |
| PATCH | /api/v1/routes/{route_id} | Update route |
| DELETE | /api/v1/routes/{route_id} | Delete route |
| POST | /api/v1/routes/evaluate | Dry-run evaluate |
| GET | /api/v1/budget/status | Budget status |
| GET | /api/v1/budget/config | Get budget config |
| PATCH | /api/v1/budget/config | Update budget config |
Tier Availability
Smart routing, A/B testing, and budget controls are available on Business and above.