Chat Completions
POST /v1/chat/completions
OpenAI-compatible chat completions endpoint with automatic knowledge retrieval and source citations. Any OpenAI client connects without modification.
Request
POST /v1/chat/completions
Authorization: Bearer cc_YOUR_API_KEY
Content-Type: application/json
{
"model": "claude-sonnet-4-5",
"messages": [
{
"role": "user",
"content": "What do our deployment runbooks say about rollbacks?"
}
],
"stream": false
}
Supported fields
CoreCube accepts standard OpenAI chat completions fields; unrecognized fields (top_p, n, tools, multimodal content arrays, role: "tool", and others) are silently ignored. Only temperature and max_tokens are forwarded to the provider. Message content must be a string, and role is limited to system, user, or assistant. The accepted fields:
| Field | Type | Default | Description |
|---|---|---|---|
model | string | default provider model | Model name forwarded to the selected provider. |
messages | array | required | Conversation history in OpenAI format |
stream | boolean | false | Enable Server-Sent Events streaming |
temperature | number | provider default | Sampling temperature forwarded to the provider |
max_tokens | number | provider default | Maximum output tokens |
CoreCube-specific fields
| Field | Type | Default | Description |
|---|---|---|---|
cube_extended | boolean | false | Enable extended mode — adds structured cube metadata to the response. |
connectionIds | string[] | omitted | Narrow retrieval to specific connection IDs. Scope enforcement still applies. |
Or use the header equivalent: X-Cube-Extended: true.
Response (strict mode)
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1744387200,
"model": "claude-sonnet-4-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "According to the deployment runbooks, rollbacks should follow this process:\n\n1. Immediately stop the deployment if health checks fail [1]\n2. Run `make rollback VERSION=<previous>` to revert the container image [1]\n3. Notify the on-call team via PagerDuty [2]\n4. Document the incident in the postmortem template [2]\n\n---\n**Sources:**\n[1] Deployment Runbook — https://confluence.company.com/pages/1234\n[2] Incident Response Guide — https://confluence.company.com/pages/5678"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1842,
"completion_tokens": 127,
"total_tokens": 1969
}
}
Response (extended mode)
With X-Cube-Extended: true or "cube_extended": true:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"choices": [...],
"usage": {...},
"cube": {
"citations": [
{
"index": 1,
"title": "Deployment Runbook",
"url": "https://confluence.company.com/pages/1234",
"connectionName": "Confluence — Engineering",
"sourceLabel": "Confluence — Engineering"
},
{
"index": 2,
"title": "Incident Response Guide",
"url": "https://confluence.company.com/pages/5678",
"connectionName": "Confluence — Engineering",
"sourceLabel": "Confluence — Engineering"
}
],
"attachments": [],
"searchLatencyMs": 142,
"llmLatencyMs": 1840,
"chunksRetrieved": 8,
"filesRetrieved": 0,
"degraded": false,
"emptyReason": null,
"degradations": []
}
}
Streaming
{ "stream": true }
Responses are streamed as Server-Sent Events in the standard OpenAI format:
data: {"id":"chatcmpl-abc","choices":[{"delta":{"role":"assistant"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"According"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" to"},"index":0}]}
...
data: [DONE]
In extended mode, a cube.metadata frame is sent before [DONE]:
data: {"object":"cube.metadata","cube":{"citations":[...],"searchLatencyMs":142,"chunksRetrieved":8}}
data: [DONE]
Streaming currently does not support presets that expose LLM-callable retrieval or action tools. Use non-streaming chat for those presets.
Rate limits and concurrency
CoreCube can return 429 before provider work starts:
| Cause | Response |
|---|---|
| API-key request-per-minute limit exceeded | 429 with the API-key rate-limit error. |
| Chat concurrency limit exceeded | 429, code CONCURRENCY_LIMIT_EXCEEDED, Retry-After: 1. |
| Provider RPM/TPM catalog limit exceeded | 429, code provider_rate_limited, Retry-After header. |
Default chat concurrency is 100 in-flight chat requests per server instance, 100 per API key, 8 per effective user, and 8 per source IP for public keys. See Environment Variables.
Insufficient evidence
When retrieval finds no relevant context (strict answerability mode):
{
"choices": [
{
"message": {
"role": "assistant",
"content": "I don't have enough information in the knowledge base to answer this question. The available sources don't contain relevant information about this topic."
},
"finish_reason": "stop"
}
]
}
The query is not forwarded to the LLM in strict mode — only the knowledge base is searched.
Examples
cURL
curl https://corecube.your-domain.com/v1/chat/completions \
-H "Authorization: Bearer cc_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Who owns the authentication service?"}],
"stream": false
}'
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="cc_YOUR_API_KEY",
base_url="https://corecube.your-domain.com/v1"
)
response = client.chat.completions.create(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": "What is our data retention policy?"}]
)
print(response.choices[0].message.content)
JavaScript (OpenAI SDK)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'cc_YOUR_API_KEY',
baseURL: 'https://corecube.your-domain.com/v1',
});
const response = await client.chat.completions.create({
model: 'claude-sonnet-4-5',
messages: [{ role: 'user', content: 'What are our deployment procedures?' }],
});
console.log(response.choices[0].message.content);
Service key (per-user permissions)
curl https://corecube.your-domain.com/v1/chat/completions \
-H "Authorization: Bearer cc_SERVICE_API_KEY" \
-H "X-Cube-User: sarah@company.com" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Show me the Q4 budget projections"}]}'
Public key (anonymous access)
Public keys require no user identity. Any X-Cube-User header is ignored — the bound scope always applies.
curl https://corecube.your-domain.com/v1/chat/completions \
-H "Authorization: Bearer cc_PUBLIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "How do I install CoreCube?"}]}'
Browser example for an embedded chat widget:
const response = await fetch('https://corecube.your-domain.com/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer cc_PUBLIC_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
messages: [{ role: 'user', content: userQuestion }],
stream: true,
}),
});
Configure CORS on your reverse proxy to allow requests from the widget's origin. Public keys do not write per-key CORS headers — CoreCube expects the proxy to handle preflight.
Model listing
GET /v1/models
Authorization: Bearer cc_YOUR_API_KEY
Returns a single model entry whose id is the instance's app name (default CoreCube), in OpenAI format. Clients like OpenWebUI use this for automatic model discovery — they simply see one model.
Other /v1 endpoints
Two additional resources are exposed under /v1. They are primarily internal to CoreCube's own clients, but are reachable with a valid API key.
| Endpoint | Purpose |
|---|---|
GET /v1/tools | List the tools available to the calling key. |
GET /v1/tools/:slug | Retrieve a single tool by slug. |
POST /v1/tools/:slug/invoke | Invoke a tool. Gated by per-tool authorization — denied with 403 tool_permission_denied. |
GET /v1/documents/:id/file | Fetch a cited source file via a signed citation token. Extended-mode cube.attachments[].downloadUrl values resolve here. |