Skip to main content

Chat Completions

POST /v1/chat/completions

OpenAI-compatible chat completions endpoint with automatic knowledge retrieval and source citations. Any OpenAI client connects without modification.

Request

POST /v1/chat/completions
Authorization: Bearer cc_YOUR_API_KEY
Content-Type: application/json
{
"model": "claude-sonnet-4-5",
"messages": [
{
"role": "user",
"content": "What do our deployment runbooks say about rollbacks?"
}
],
"stream": false
}

Supported fields

CoreCube accepts all standard OpenAI chat completions fields. The most relevant:

FieldTypeDefaultDescription
modelstringdefault provider modelLLM model to use. Must match a configured provider model.
messagesarrayrequiredConversation history in OpenAI format
streambooleanfalseEnable Server-Sent Events streaming
temperaturenumberprovider defaultSampling temperature (forwarded to LLM provider)
max_tokensnumberprovider defaultMaximum output tokens

CoreCube-specific fields

FieldTypeDefaultDescription
cube_extendedbooleanfalseEnable extended mode — adds structured corecube metadata to the response

Or use the header equivalent: X-Cube-Extended: true.

Response (strict mode)

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1744387200,
"model": "claude-sonnet-4-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "According to the deployment runbooks, rollbacks should follow this process:\n\n1. Immediately stop the deployment if health checks fail [1]\n2. Run `make rollback VERSION=<previous>` to revert the container image [1]\n3. Notify the on-call team via PagerDuty [2]\n4. Document the incident in the postmortem template [2]\n\n---\n**Sources:**\n[1] Deployment Runbook — https://confluence.company.com/pages/1234\n[2] Incident Response Guide — https://confluence.company.com/pages/5678"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1842,
"completion_tokens": 127,
"total_tokens": 1969
}
}

Response (extended mode)

With X-Cube-Extended: true or "cube_extended": true:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"choices": [...],
"usage": {...},
"corecube": {
"citations": [
{
"index": 1,
"title": "Deployment Runbook",
"url": "https://confluence.company.com/pages/1234",
"connection": "Confluence — Engineering",
"connection_id": "conn_a1b2c3",
"source_path": "connector",
"indexed_at": "2026-04-10T14:30:00Z",
"relevance_score": 0.94
},
{
"index": 2,
"title": "Incident Response Guide",
"url": "https://confluence.company.com/pages/5678",
"connection": "Confluence — Engineering",
"connection_id": "conn_a1b2c3",
"source_path": "connector",
"indexed_at": "2026-04-09T09:15:00Z",
"relevance_score": 0.87
}
],
"search_latency_ms": 142,
"llm_latency_ms": 1840,
"chunks_retrieved": 8,
"answerability": "high"
}
}

Streaming

{ "stream": true }

Responses are streamed as Server-Sent Events in the standard OpenAI format:

data: {"id":"chatcmpl-abc","choices":[{"delta":{"role":"assistant"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"According"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" to"},"index":0}]}
...
data: [DONE]

In extended mode, a corecube.sources event is sent before [DONE]:

data: {"corecube":{"citations":[...],"search_latency_ms":142}}
data: [DONE]

Insufficient evidence

When retrieval finds no relevant context (strict answerability mode):

{
"choices": [
{
"message": {
"role": "assistant",
"content": "I don't have enough information in the knowledge base to answer this question. The available sources don't contain relevant information about this topic."
},
"finish_reason": "stop"
}
]
}

The query is not forwarded to the LLM in strict mode — only the knowledge base is searched.

Examples

cURL

curl https://corecube.your-domain.com/v1/chat/completions \
-H "Authorization: Bearer cc_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Who owns the authentication service?"}],
"stream": false
}'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
api_key="cc_YOUR_API_KEY",
base_url="https://corecube.your-domain.com/v1"
)

response = client.chat.completions.create(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": "What is our data retention policy?"}]
)

print(response.choices[0].message.content)

JavaScript (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
apiKey: 'cc_YOUR_API_KEY',
baseURL: 'https://corecube.your-domain.com/v1',
});

const response = await client.chat.completions.create({
model: 'claude-sonnet-4-5',
messages: [{ role: 'user', content: 'What are our deployment procedures?' }],
});

console.log(response.choices[0].message.content);

Service key (per-user permissions)

curl https://corecube.your-domain.com/v1/chat/completions \
-H "Authorization: Bearer cc_SERVICE_API_KEY" \
-H "X-Cube-User: sarah@company.com" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Show me the Q4 budget projections"}]}'

Public key (anonymous access)

Public keys require no user identity. Any X-Cube-User header is ignored — the bound scope always applies.

curl https://corecube.your-domain.com/v1/chat/completions \
-H "Authorization: Bearer cc_PUBLIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "How do I install CoreCube?"}]}'

Browser example for an embedded chat widget:

const response = await fetch('https://corecube.your-domain.com/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer cc_PUBLIC_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
messages: [{ role: 'user', content: userQuestion }],
stream: true,
}),
});

Configure CORS on your reverse proxy to allow requests from the widget's origin. Public keys do not write per-key CORS headers — CoreCube expects the proxy to handle preflight.

Model listing

GET /v1/models
Authorization: Bearer cc_YOUR_API_KEY

Returns all models available from configured LLM providers in OpenAI format. Used by clients like OpenWebUI for automatic model discovery.

We use cookies for analytics to improve our website. More information in our Privacy Policy.