Batch API
Submit bulk inference jobs for asynchronous processing. Batch requests run at lower priority with 50% cost savings compared to real-time inference.
How It Works
Upload a JSONL file containing multiple inference requests, then poll for completion. Each line in the file is an independent request processed in parallel on available backend capacity.
1. Upload JSONL input file
2. Create batch job → returns batch_id
3. Poll GET /v1/batches/{batch_id} until status = "completed"
4. Download results from output_file_idCreate a Batch Job
curl https://api.agilecloud.ai/v1/batches \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "file-abc123",
"endpoint": "/v1/chat/completions",
"completion_window": "24h"
}'Input File Format
Each line is a JSON object with custom_id, method, url, and body:
{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Summarize this document..."}]}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Translate to Spanish..."}]}}Using Prompts in Batch
Instead of inlining messages in every line, you can reference a prompt template by slug. The batch processor resolves the template, renders {{variables}} into the prompt's messages, and merges any model_config (temperature, max_tokens, etc.) from the prompt version.
{"custom_id": "doc-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "prompt_slug": "summarize", "variables": {"text": "First document content..."}}}
{"custom_id": "doc-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "prompt_slug": "summarize", "variables": {"text": "Second document content..."}}}
{"custom_id": "doc-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "prompt_slug": "summarize", "prompt_version": 2, "variables": {"text": "Use a specific version..."}}}| Field | Type | Description |
|---|---|---|
| prompt_slug | string | Prompt template slug (replaces messages) |
| prompt_version | integer | Optional — defaults to the current published version |
| variables | object | Key–value pairs for {{template}} placeholders |
Lines with prompt_slug and lines with literal messages can be mixed in the same file. Body-level fields like temperature override the prompt's model config.
You can also create prompt-based batches from the dashboard: Batch → New Job → From Prompt. Select a prompt, paste a JSON array of variable sets, and the UI generates the JSONL automatically.
Check Job Status
curl https://api.agilecloud.ai/v1/batches/batch_abc123 \ -H "Authorization: Bearer YOUR_API_KEY"
Possible statuses:
| Status | Description |
|---|---|
| validating | Input file being validated |
| in_progress | Requests being processed |
| completed | All requests finished |
| failed | Job failed (check errors) |
| cancelled | Cancelled by user |
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /v1/batches | Create batch job |
| GET | /v1/batches | List batch jobs |
| GET | /v1/batches/{batch_id} | Get job status |
| POST | /v1/batches/{batch_id}/cancel | Cancel job |
Billing
Batch requests are billed at 50% of real-time rates. Token usage is metered per-request within the batch and reported to Stripe at the discounted rate.
Priority Scheduling
Batch jobs run at lower priority than real-time requests. When backend capacity is available, batch requests are processed immediately. During high load, batch jobs queue until capacity frees up — within the configured completion window (default: 24 hours).