Chat Completions
POST /v1/chat/completions
OpenAI-compatible chat completions endpoint with automatic knowledge retrieval and source citations. Any OpenAI client connects without modification.
Request
POST /v1/chat/completions
Authorization: Bearer cc_YOUR_API_KEY
Content-Type: application/json
{
"model": "claude-sonnet-4-5",
"messages": [
{
"role": "user",
"content": "What do our deployment runbooks say about rollbacks?"
}
],
"stream": false
}
Supported fields
CoreCube accepts all standard OpenAI chat completions fields. The most relevant:
| Field | Type | Default | Description |
|---|---|---|---|
model | string | default provider model | LLM model to use. Must match a configured provider model. |
messages | array | required | Conversation history in OpenAI format |
stream | boolean | false | Enable Server-Sent Events streaming |
temperature | number | provider default | Sampling temperature (forwarded to LLM provider) |
max_tokens | number | provider default | Maximum output tokens |
CoreCube-specific fields
| Field | Type | Default | Description |
|---|---|---|---|
cube_extended | boolean | false | Enable extended mode — adds structured corecube metadata to the response |
Or use the header equivalent: X-Cube-Extended: true.
Response (strict mode)
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1744387200,
"model": "claude-sonnet-4-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "According to the deployment runbooks, rollbacks should follow this process:\n\n1. Immediately stop the deployment if health checks fail [1]\n2. Run `make rollback VERSION=<previous>` to revert the container image [1]\n3. Notify the on-call team via PagerDuty [2]\n4. Document the incident in the postmortem template [2]\n\n---\n**Sources:**\n[1] Deployment Runbook — https://confluence.company.com/pages/1234\n[2] Incident Response Guide — https://confluence.company.com/pages/5678"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1842,
"completion_tokens": 127,
"total_tokens": 1969
}
}
Response (extended mode)
With X-Cube-Extended: true or "cube_extended": true:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"choices": [...],
"usage": {...},
"corecube": {
"citations": [
{
"index": 1,
"title": "Deployment Runbook",
"url": "https://confluence.company.com/pages/1234",
"connection": "Confluence — Engineering",
"connection_id": "conn_a1b2c3",
"source_path": "connector",
"indexed_at": "2026-04-10T14:30:00Z",
"relevance_score": 0.94
},
{
"index": 2,
"title": "Incident Response Guide",
"url": "https://confluence.company.com/pages/5678",
"connection": "Confluence — Engineering",
"connection_id": "conn_a1b2c3",
"source_path": "connector",
"indexed_at": "2026-04-09T09:15:00Z",
"relevance_score": 0.87
}
],
"search_latency_ms": 142,
"llm_latency_ms": 1840,
"chunks_retrieved": 8,
"answerability": "high"
}
}
Streaming
{ "stream": true }
Responses are streamed as Server-Sent Events in the standard OpenAI format:
data: {"id":"chatcmpl-abc","choices":[{"delta":{"role":"assistant"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"According"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" to"},"index":0}]}
...
data: [DONE]
In extended mode, a corecube.sources event is sent before [DONE]:
data: {"corecube":{"citations":[...],"search_latency_ms":142}}
data: [DONE]
Insufficient evidence
When retrieval finds no relevant context (strict answerability mode):
{
"choices": [
{
"message": {
"role": "assistant",
"content": "I don't have enough information in the knowledge base to answer this question. The available sources don't contain relevant information about this topic."
},
"finish_reason": "stop"
}
]
}
The query is not forwarded to the LLM in strict mode — only the knowledge base is searched.
Examples
cURL
curl https://corecube.your-domain.com/v1/chat/completions \
-H "Authorization: Bearer cc_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Who owns the authentication service?"}],
"stream": false
}'
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="cc_YOUR_API_KEY",
base_url="https://corecube.your-domain.com/v1"
)
response = client.chat.completions.create(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": "What is our data retention policy?"}]
)
print(response.choices[0].message.content)
JavaScript (OpenAI SDK)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'cc_YOUR_API_KEY',
baseURL: 'https://corecube.your-domain.com/v1',
});
const response = await client.chat.completions.create({
model: 'claude-sonnet-4-5',
messages: [{ role: 'user', content: 'What are our deployment procedures?' }],
});
console.log(response.choices[0].message.content);
Service key (per-user permissions)
curl https://corecube.your-domain.com/v1/chat/completions \
-H "Authorization: Bearer cc_SERVICE_API_KEY" \
-H "X-Cube-User: sarah@company.com" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Show me the Q4 budget projections"}]}'
Public key (anonymous access)
Public keys require no user identity. Any X-Cube-User header is ignored — the bound scope always applies.
curl https://corecube.your-domain.com/v1/chat/completions \
-H "Authorization: Bearer cc_PUBLIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "How do I install CoreCube?"}]}'
Browser example for an embedded chat widget:
const response = await fetch('https://corecube.your-domain.com/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer cc_PUBLIC_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
messages: [{ role: 'user', content: userQuestion }],
stream: true,
}),
});
Configure CORS on your reverse proxy to allow requests from the widget's origin. Public keys do not write per-key CORS headers — CoreCube expects the proxy to handle preflight.
Model listing
GET /v1/models
Authorization: Bearer cc_YOUR_API_KEY
Returns all models available from configured LLM providers in OpenAI format. Used by clients like OpenWebUI for automatic model discovery.