Batch API

Submit bulk inference jobs for asynchronous processing. Batch requests run at lower priority with 50% cost savings compared to real-time inference.

How It Works

Upload a JSONL file containing multiple inference requests, then poll for completion. Each line in the file is an independent request processed in parallel on available backend capacity.

1. Upload JSONL input file
2. Create batch job → returns batch_id
3. Poll GET /v1/batches/{batch_id} until status = "completed"
4. Download results from output_file_id

Create a Batch Job

curl https://api.agilecloud.ai/v1/batches \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id": "file-abc123",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
  }'

Input File Format

Each line is a JSON object with custom_id, method, url, and body:

{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Summarize this document..."}]}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Translate to Spanish..."}]}}

Using Prompts in Batch

Instead of inlining messages in every line, you can reference a prompt template by slug. The batch processor resolves the template, renders {{variables}} into the prompt's messages, and merges any model_config (temperature, max_tokens, etc.) from the prompt version.

{"custom_id": "doc-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "prompt_slug": "summarize", "variables": {"text": "First document content..."}}}
{"custom_id": "doc-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "prompt_slug": "summarize", "variables": {"text": "Second document content..."}}}
{"custom_id": "doc-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "prompt_slug": "summarize", "prompt_version": 2, "variables": {"text": "Use a specific version..."}}}
FieldTypeDescription
prompt_slugstringPrompt template slug (replaces messages)
prompt_versionintegerOptional — defaults to the current published version
variablesobjectKey–value pairs for {{template}} placeholders

Lines with prompt_slug and lines with literal messages can be mixed in the same file. Body-level fields like temperature override the prompt's model config.

You can also create prompt-based batches from the dashboard: Batch → New Job → From Prompt. Select a prompt, paste a JSON array of variable sets, and the UI generates the JSONL automatically.

Check Job Status

curl https://api.agilecloud.ai/v1/batches/batch_abc123 \
  -H "Authorization: Bearer YOUR_API_KEY"

Possible statuses:

StatusDescription
validatingInput file being validated
in_progressRequests being processed
completedAll requests finished
failedJob failed (check errors)
cancelledCancelled by user

Endpoints

MethodPathDescription
POST/v1/batchesCreate batch job
GET/v1/batchesList batch jobs
GET/v1/batches/{batch_id}Get job status
POST/v1/batches/{batch_id}/cancelCancel job

Billing

Batch requests are billed at 50% of real-time rates. Token usage is metered per-request within the batch and reported to Stripe at the discounted rate.

Priority Scheduling

Batch jobs run at lower priority than real-time requests. When backend capacity is available, batch requests are processed immediately. During high load, batch jobs queue until capacity frees up — within the configured completion window (default: 24 hours).