Retrieval Pipeline

CoreCube's retrieval pipeline transforms a user query into a grounded, cited answer by composing the active preset pipeline, running scoped hybrid search, applying query/chunk tools, and assembling context before passing anything to an LLM.

Pipeline overview

Preset resolution (step 0)

Every query runs through the active preset binding:

Resolve the caller's organization or compartment preset.
Load the binding's resolved pipeline snapshot.
Read the preset answer model and tool-loop bounds.
Use the preset's default retrieval tool as the source of query-time retrieval settings.

The snapshot contains query tools, default retrieval, optional LLM-callable retrieval/action capabilities, chunk tools, and prompt fragments grouped by stage. This is why two presets can use the same corpus but behave differently: the retrieval profile, tool chain, answer model, and prompt composition are all part of the preset.

Query processing (step 1)

When a query arrives at /v1/chat/completions:

Admission control — API-key rate limits, chat concurrency limits, and provider catalog limits protect the server from overload.
Query extraction — The last user message is extracted for retrieval; full conversation history still goes to the answer LLM.
Scope resolution — API key → effective user → allowed connections/scopes (compartments + sensitivity levels).
Query tools — Active query tools can rewrite the query into semantic and keyword variants.
Query embedding — The query text is embedded using the same model family as document chunks.

Hybrid search (step 2)

The default retrieval tool controls the dense pool, sparse pool, fusion weights, score floor, freshness floor, HNSW ef_search, reranker model, rerank pool, and final top-K. Two search legs run in parallel and are fused:

Full-text search uses PostgreSQL's websearch_to_tsquery with weighted tsvectors:

Weight A — document title (highest signal)
Weight B — heading path (e.g., "Deployment > Staging > Prerequisites")
Weight C — chunk body content

Vector search uses pgvector's HNSW index for approximate nearest neighbor with cosine distance.

RRF fusion combines both ranked lists. Documents appearing in both lists are ranked higher than documents in only one list. The default presets use rrf_k = 60, with vector and FTS stream weights stored on the retrieval tool.

Ranking and filtering (step 3)

The fused candidate list passes through ranking and filtering:

Stage	Description
Freshness decay	Exponential decay based on time since last sync. Configurable half-life. Recently synced content ranks higher.
Quality filter	Exclude boilerplate chunks (navigation menus, footers, repeated disclaimers) below a quality score threshold
ACL filter	Enforce scope — remove any chunks from connections outside the query's allowed compartments/sensitivity
Cross-encoder reranker	Score each (query, chunk) pair jointly. Candidate pool, final top-K, enabled flag, and model come from the retrieval tool. Graceful fallback to fusion ranking when reranking is disabled or unavailable.
Chunk tools	Optional preset tools can re-score, filter, or expand the candidate list before answer generation.

Hybrid ranking weights

Signal	Default source
Vector similarity	Retrieval tool dense stream
Full-text relevance	Retrieval tool sparse stream
Freshness boost	Retrieval tool + `freshness_decay` chunk tool

When scores are equal (within 0.01 tolerance), source trust level breaks the tie: authoritative > reference > volatile.

Context assembly (step 4)

Context is formatted as:

### [1] Deployment Runbook — Confluence — Engineering
...chunk content...

### [2] Incident Response Guide — Confluence — Engineering
...chunk content...

The numbered headers map directly to citation references in the response.

LLM generation (step 5)

Prompt assembly — Answer-stage fragments + active tool prompts + invariant footer + formatted context + conversation history.
Answer model selection — The active preset's answer model is used when set; otherwise the default LLM provider is selected.
Tool-call loop — Non-streaming requests can let the answer LLM call retrieval or action capabilities attached to the preset.
Streaming — SSE chunks in OpenAI format are forwarded to the client. Extended streaming adds a cube.metadata frame before [DONE].
Citation mapping — [N] references in the response text map to the numbered context chunks.

Ingestion pipeline

How documents get from source systems into the evidence layer:

Ingestion is idempotent. Changed documents replace chunks through an atomic chunk replacement path, unchanged external documents can repair missing chunk state, and cross-source content deduplication does not reuse another connection's storage key.

Chunking strategies

Content type	Strategy	Behavior
Markdown / docs	Heading-aware	Split at H1/H2/H3 boundaries, preserve section hierarchy
Code files	Code-aware	Split at function/class boundaries, keep imports with first chunk
HTML pages	Heading-aware	Extract to markdown first, then heading-based chunking
PDF	Paragraph-based	Paragraph boundaries (PDFs lack reliable heading structure)
Plain text	Fixed-size	Fixed-size with configurable overlap
JSON / structured	Record-based	Each top-level record or array item becomes a chunk
Tables	Intact	Tables kept whole when possible, headers repeated when split

Default chunk size: 512 tokens target, 1024 token maximum. Overlap: 50 tokens.

Query Explorer

Use the Query Explorer in the Admin Console to inspect the full pipeline for any query:

Score breakdown per chunk (vector, FTS, freshness, combined)
Active preset, retrieval tool, query tools, and chunk tools
Side-by-side search mode comparison (vector-only vs FTS-only vs hybrid)
Which connections were included or excluded by scope
Context assembly trace (which chunks selected, filtered, deduplicated, budget-truncated)
Answerability assessment with confidence level

Pipeline overview​

Preset resolution (step 0)​

Query processing (step 1)​

Hybrid search (step 2)​

Ranking and filtering (step 3)​

Hybrid ranking weights​

Context assembly (step 4)​

LLM generation (step 5)​

Ingestion pipeline​

Chunking strategies​

Query Explorer​