Embeddings
Generate vector embeddings for semantic search, RAG, clustering, and classification. OpenAI-compatible endpoint.
Endpoint
POST https://api.agilecloud.ai/v1/embeddings
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model name or alias (e.g., bge-large-en-v1.5) |
| input | string | array | Yes | Text string or array of strings to embed |
Example
curl https://api.agilecloud.ai/v1/embeddings \
-H "Authorization: Bearer $DIRECTAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-large-en-v1.5",
"input": "DirectAI provides compliance-first AI inference."
}'Response
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0123, -0.0456, 0.0789, ...]
}
],
"model": "bge-large-en-v1.5",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}Batch Embedding
Pass an array of strings to embed multiple texts in a single request. The server uses dynamic batching to maximize throughput.
curl https://api.agilecloud.ai/v1/embeddings \
-H "Authorization: Bearer $DIRECTAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-large-en-v1.5",
"input": [
"First document text",
"Second document text",
"Third document text"
]
}'The response data array will contain one embedding per input, in the same order.
Python SDK Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.agilecloud.ai/v1",
api_key="YOUR_API_KEY",
)
response = client.embeddings.create(
model="bge-large-en-v1.5",
input=["Hello, world!", "Another sentence"],
)
for item in response.data:
print(f"Index {item.index}: {len(item.embedding)} dimensions")Performance Notes
- BGE-large produces 1024-dimensional embeddings
- Maximum sequence length: 512 tokens — longer texts are truncated
- Dynamic batching groups up to 256 inputs per batch for maximum throughput
- Powered by serverless inference endpoints