Batch API

Submit bulk inference jobs for asynchronous processing. Batch requests run at lower priority with 50% cost savings compared to real-time inference.

How It Works

Upload a JSONL file containing multiple inference requests, then poll for completion. Each line in the file is an independent request processed in parallel on available backend capacity.

1. Upload JSONL input file
2. Create batch job → returns batch_id
3. Poll GET /v1/batches/{batch_id} until status = "completed"
4. Download results from output_file_id

Create a Batch Job

curl https://api.agilecloud.ai/v1/batches \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id": "file-abc123",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
  }'

Input File Format

Each line is a JSON object with custom_id, method, url, and body:

{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Summarize this document..."}]}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Translate to Spanish..."}]}}

Using Prompts in Batch

Instead of inlining messages in every line, you can reference a prompt template by slug. The batch processor resolves the template, renders {{variables}} into the prompt's messages, and merges any model_config (temperature, max_tokens, etc.) from the prompt version.

{"custom_id": "doc-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "prompt_slug": "summarize", "variables": {"text": "First document content..."}}}
{"custom_id": "doc-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "prompt_slug": "summarize", "variables": {"text": "Second document content..."}}}
{"custom_id": "doc-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "prompt_slug": "summarize", "prompt_version": 2, "variables": {"text": "Use a specific version..."}}}

Field	Type	Description
prompt_slug	string	Prompt template slug (replaces `messages`)
prompt_version	integer	Optional — defaults to the current published version
variables	object	Key–value pairs for {{template}} placeholders

Lines with prompt_slug and lines with literal messages can be mixed in the same file. Body-level fields like temperature override the prompt's model config.

You can also create prompt-based batches from the dashboard: Batch → New Job → From Prompt. Select a prompt, paste a JSON array of variable sets, and the UI generates the JSONL automatically.

Check Job Status

curl https://api.agilecloud.ai/v1/batches/batch_abc123 \
  -H "Authorization: Bearer YOUR_API_KEY"

Possible statuses:

Status	Description
validating	Input file being validated
in_progress	Requests being processed
completed	All requests finished
failed	Job failed (check errors)
cancelled	Cancelled by user

Endpoints

Method	Path	Description
POST	/v1/batches	Create batch job
GET	/v1/batches	List batch jobs
GET	/v1/batches/{batch_id}	Get job status
POST	/v1/batches/{batch_id}/cancel	Cancel job

Billing

Batch requests are billed at 50% of real-time rates. Token usage is metered per-request within the batch and reported to Stripe at the discounted rate.

Priority Scheduling

Batch jobs run at lower priority than real-time requests. When backend capacity is available, batch requests are processed immediately. During high load, batch jobs queue until capacity frees up — within the configured completion window (default: 24 hours).