LLM Providers
CoreCube routes queries to LLM providers through a configurable routing layer. Multiple providers can be configured with routing rules, fallback chains, and per-user overrides.
Supported providers
| Provider | Type | Endpoint |
|---|---|---|
| Anthropic Claude | Managed | Anthropic API |
| OpenAI | Managed | OpenAI API |
| Custom / Self-hosted | OpenAI-compatible | Any /v1/chat/completions endpoint (Ollama, vLLM, LM Studio, etc.) |
Adding a provider
- Navigate to Admin Console → LLM Providers
- Click New Provider
- Select the provider type
- Enter the API key and select a model
- Click Test to verify connectivity
- Save
For custom providers (Ollama, vLLM, etc.), enter the base URL of your endpoint:
| Provider | Base URL example |
|---|---|
| Ollama (local) | http://localhost:11434/v1 |
| Ollama (Docker) | http://ollama:11434/v1 |
| vLLM | http://vllm:8000/v1 |
| LM Studio | http://localhost:1234/v1 |
Routing
Default provider
One provider is designated as the default. All queries are routed there unless a routing rule or user override applies.
Routing rules
Rules are evaluated in order. The first matching rule wins:
| Condition | Action |
|---|---|
| User is in scope X | Route to provider Y |
Query comes from API key type service | Route to provider Z |
Compartment is executive | Route to local provider (privacy routing) |
Fallback chain
When the primary provider fails (error, timeout, rate limit), CoreCube automatically tries the next provider in the fallback chain. Configure the order in LLM Providers → Routing Rules.
Privacy routing
Route queries involving sensitive compartments to a local-only provider (Ollama, vLLM) to prevent organizational knowledge from reaching cloud APIs:
Compartment: executive, hr
Sensitivity: confidential, restricted
→ Route to: local-ollama (no external API call)
Per-user overrides
Users can override the default routing in their profile settings — for example, to use a specific model for their personal API key. Admins can restrict or disable per-user overrides.
Provider health monitoring
The Admin Console shows real-time provider health:
| Metric | Description |
|---|---|
| Status | Connected / degraded / offline |
| Queries | Total queries routed to this provider |
| Tokens | Input and output tokens consumed |
| Latency | Average response latency (p50, p95) |
| Cost | Estimated cost based on token counts |
| Error rate | Percentage of failed requests |
Usage stats
Navigate to Admin Console → LLM Providers to see per-provider usage broken down by time period (today, week, month).
Streaming
CoreCube forwards streaming responses from LLM providers to clients via Server-Sent Events (SSE) in the standard OpenAI format. Streaming is enabled by default and works with any OpenAI-compatible client.
To disable streaming for a specific request, set "stream": false in the request body.