RAG (Retrieval-Augmented Generation)
Managed vector search and grounded LLM generation. Upload documents, search semantically, and generate responses grounded in your data — all through a single API.
How It Works
1. Create a collection (namespace for documents) 2. Upload documents → auto-chunked and embedded 3. Search: vector / hybrid / keyword retrieval 4. Query: search + LLM generation in one call
Collections
Collections group related documents. Each collection uses a configurable embedding model and chunking strategy.
curl https://api.agilecloud.ai/api/v1/rag/collections \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "knowledge-base",
"description": "Internal documentation",
"embedding_model": "bge-large-en-v1.5",
"chunk_size": 512,
"chunk_overlap": 50
}'Uploading Documents
Documents are auto-chunked, embedded, and indexed. Supported formats: plain text, Markdown, PDF, HTML.
curl https://api.agilecloud.ai/api/v1/rag/collections/coll_abc123/documents \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"documents": [
{
"content": "DirectAI provides managed inference with compliance built-in...",
"metadata": {"source": "docs", "section": "overview"}
}
]
}'Search
Search across documents using vector similarity, keyword matching, or hybrid (combined) retrieval.
curl https://api.agilecloud.ai/v1/rag/search \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"collection_id": "coll_abc123",
"query": "How does autoscaling work?",
"search_type": "hybrid",
"top_k": 5
}'| Search Type | Description |
|---|---|
| vector | Cosine similarity on embeddings |
| keyword | BM25 keyword matching |
| hybrid | Combined vector + keyword (reciprocal rank fusion) |
Query (Search + Generate)
One-step retrieval-augmented generation. Searches your collection, constructs a grounded prompt, and streams a response from the LLM.
curl https://api.agilecloud.ai/v1/rag/query \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"collection_id": "coll_abc123",
"query": "How does model autoscaling work?",
"model": "qwen-2.5-3b",
"top_k": 5,
"stream": true
}'The response includes the generated text plus the source chunks used for grounding, so you can display citations.
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/rag/collections | Create collection |
| GET | /api/v1/rag/collections | List collections |
| POST | /api/v1/rag/collections/{id}/documents | Upload documents |
| GET | /api/v1/rag/collections/{id}/documents | List documents |
| POST | /v1/rag/search | Vector / hybrid / keyword search |
| POST | /v1/rag/query | Search + LLM generation |
| GET | /api/v1/rag/usage | Storage usage and tier cap |
Tier Availability
RAG is available on Business and above. Storage limits scale by tier. Embedding and LLM tokens used during ingestion and query are billed at standard rates.