Skip to main content

LLM Providers

CoreCube routes queries to LLM providers through a configurable routing layer. Multiple providers can be configured with routing rules, fallback chains, and per-user overrides.

Supported providers

ProviderTypeEndpoint
Anthropic ClaudeManagedAnthropic API
OpenAIManagedOpenAI API
Custom / Self-hostedOpenAI-compatibleAny /v1/chat/completions endpoint (Ollama, vLLM, LM Studio, etc.)

Adding a provider

  1. Navigate to Admin Console → LLM Providers
  2. Click New Provider
  3. Select the provider type
  4. Enter the API key and select a model
  5. Click Test to verify connectivity
  6. Save

For custom providers (Ollama, vLLM, etc.), enter the base URL of your endpoint:

ProviderBase URL example
Ollama (local)http://localhost:11434/v1
Ollama (Docker)http://ollama:11434/v1
vLLMhttp://vllm:8000/v1
LM Studiohttp://localhost:1234/v1

Routing

Default provider

One provider is designated as the default. All queries are routed there unless a routing rule or user override applies.

Routing rules

Rules are evaluated in order. The first matching rule wins:

ConditionAction
User is in scope XRoute to provider Y
Query comes from API key type serviceRoute to provider Z
Compartment is executiveRoute to local provider (privacy routing)

Fallback chain

When the primary provider fails (error, timeout, rate limit), CoreCube automatically tries the next provider in the fallback chain. Configure the order in LLM Providers → Routing Rules.

Privacy routing

Route queries involving sensitive compartments to a local-only provider (Ollama, vLLM) to prevent organizational knowledge from reaching cloud APIs:

Compartment: executive, hr
Sensitivity: confidential, restricted
→ Route to: local-ollama (no external API call)

Per-user overrides

Users can override the default routing in their profile settings — for example, to use a specific model for their personal API key. Admins can restrict or disable per-user overrides.

Provider health monitoring

The Admin Console shows real-time provider health:

MetricDescription
StatusConnected / degraded / offline
QueriesTotal queries routed to this provider
TokensInput and output tokens consumed
LatencyAverage response latency (p50, p95)
CostEstimated cost based on token counts
Error ratePercentage of failed requests

Usage stats

Navigate to Admin Console → LLM Providers to see per-provider usage broken down by time period (today, week, month).

Streaming

CoreCube forwards streaming responses from LLM providers to clients via Server-Sent Events (SSE) in the standard OpenAI format. Streaming is enabled by default and works with any OpenAI-compatible client.

To disable streaming for a specific request, set "stream": false in the request body.

We use cookies for analytics to improve our website. More information in our Privacy Policy.