RAG (Retrieval-Augmented Generation)

Managed vector search and grounded LLM generation. Upload documents, search semantically, and generate responses grounded in your data — all through a single API.

How It Works

1. Create a collection (namespace for documents)
2. Upload documents → auto-chunked and embedded
3. Search: vector / hybrid / keyword retrieval
4. Query: search + LLM generation in one call

Collections

Collections group related documents. Each collection uses a configurable embedding model and chunking strategy.

curl https://api.agilecloud.ai/api/v1/rag/collections \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "knowledge-base",
    "description": "Internal documentation",
    "embedding_model": "bge-large-en-v1.5",
    "chunk_size": 512,
    "chunk_overlap": 50
  }'

Uploading Documents

Documents are auto-chunked, embedded, and indexed. Supported formats: plain text, Markdown, PDF, HTML.

curl https://api.agilecloud.ai/api/v1/rag/collections/coll_abc123/documents \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {
        "content": "DirectAI provides managed inference with compliance built-in...",
        "metadata": {"source": "docs", "section": "overview"}
      }
    ]
  }'

Search

Search across documents using vector similarity, keyword matching, or hybrid (combined) retrieval.

curl https://api.agilecloud.ai/v1/rag/search \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_id": "coll_abc123",
    "query": "How does autoscaling work?",
    "search_type": "hybrid",
    "top_k": 5
  }'
Search TypeDescription
vectorCosine similarity on embeddings
keywordBM25 keyword matching
hybridCombined vector + keyword (reciprocal rank fusion)

Query (Search + Generate)

One-step retrieval-augmented generation. Searches your collection, constructs a grounded prompt, and streams a response from the LLM.

curl https://api.agilecloud.ai/v1/rag/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_id": "coll_abc123",
    "query": "How does model autoscaling work?",
    "model": "qwen-2.5-3b",
    "top_k": 5,
    "stream": true
  }'

The response includes the generated text plus the source chunks used for grounding, so you can display citations.

Endpoints

MethodPathDescription
POST/api/v1/rag/collectionsCreate collection
GET/api/v1/rag/collectionsList collections
POST/api/v1/rag/collections/{id}/documentsUpload documents
GET/api/v1/rag/collections/{id}/documentsList documents
POST/v1/rag/searchVector / hybrid / keyword search
POST/v1/rag/querySearch + LLM generation
GET/api/v1/rag/usageStorage usage and tier cap

Tier Availability

RAG is available on Business and above. Storage limits scale by tier. Embedding and LLM tokens used during ingestion and query are billed at standard rates.