Retrieval
Retrieval is the first ranking pass of the pipeline. Given a query, it produces a candidate list that the reranker and the LLM will work from. Get this stage wrong and no amount of clever reranking will recover — you cannot rank what was never retrieved.
CoreCube uses hybrid retrieval: two parallel searches (dense vector + sparse keyword) that each produce a ranked list, fused into a single ordering by Reciprocal Rank Fusion (RRF). The two streams catch different things:
- Dense (vector) finds paraphrases, related concepts, fuzzy matches. Strong on natural-language questions.
- Sparse (FTS) finds exact tokens, identifiers, codes, part numbers, file paths. Strong on lookups.
Most production queries are a mix of both, which is why the default is hybrid.
Every field on this page applies to the next query — no rebuild, no reindex, no waiting. Switch presets, tweak a weight, and the next search reflects it.
Dense top-K
The number of candidates the dense (vector) search returns before fusion.
Dense search runs first, returns its top-K most similar chunks by cosine distance. Those candidates are then merged with the sparse list via RRF.
| Value | Effect |
|---|---|
10–20 | Tight pool — risks missing semantically relevant chunks that ranked just below the cutoff. Use only when latency is critical. |
20–50 (typical) | Healthy pool — captures most semantically relevant chunks for the reranker to choose from. |
50–100 | Wide pool — useful when the dense vector is missing the right chunks at smaller K. Diminishing returns above 100. |
> 100 | Reranker territory — the rerank pool (Rerank pool) caps fed candidates anyway, so going higher is wasted. |
Raise this when you suspect the right chunks are being filtered out before reaching the reranker — i.e. searching for a known-good chunk in the index returns it ranked below 20.
Sparse top-K
The number of candidates the sparse (full-text keyword) search returns before fusion.
Mirror of dense top-K but for the FTS path. Uses Postgres websearch_to_tsquery + ts_rank under the hood.
Same intuitions apply — typical: 20–50. Usually matched to dense top-K so neither stream dominates by sheer volume of candidates. The two pools are independent — a chunk that scores in dense top-K but not in sparse top-K is still considered (the union goes into RRF).
Asymmetric tuning: if your corpus is keyword-heavy (logs, code, identifiers) you may want sparse > dense; if it is prose-heavy (articles, transcripts) the reverse.
RRF k
The dampening constant in the Reciprocal Rank Fusion formula.
For each chunk, RRF computes:
score(chunk) = vector_weight × 1 / (rrf_k + vec_rank)
+ fts_weight × 1 / (rrf_k + fts_rank)
rrf_k controls how steeply rank position matters. The standard value 60 comes from the original RRF paper and is appropriate for top-10 retrieval.
| Value | Effect |
|---|---|
10–30 | Aggressive top-of-list bias. Rank 1 dominates. Rank 5 barely contributes. Use when only the very top results matter. |
60 (default) | Smooth decay across the top ~50 results. Documents that appear in both streams reliably rise. |
100–200 | Mild decay. Mid-ranked items contribute meaningfully. Use when you want a broad, diverse candidate set. |
> 200 | Near-linear fusion. Rank position barely matters relative to whether a chunk hit at all. |
Leave at 60 unless you have a measured reason. The value is a published default and works well for top-K = 10 retrieval.
Vector weight
Multiplier on the dense stream's contribution to fused rank.
Combined with FTS weight, this is how you bias hybrid retrieval toward semantic matches over keyword matches (or vice versa).
| Value | Effect |
|---|---|
0 | Disables the vector stream entirely — hybrid degrades to pure FTS. |
1.0 | Equal voice with FTS. Both streams contribute the same. |
1.2 (default) | Seeded default. Slight vector bias over FTS, which defaults to 0.8. |
1.5–2.0 | Vector-biased. Semantic matches outrank keyword matches when both hit. Useful for question-style queries. |
> 3 | Heavy vector bias. Use only when measured to help — easy to over-tune and lose lookup precision. |
Vector weight and FTS weight are a ratio. Setting vector_weight = 1.5, fts_weight = 1.0 is mathematically the same as vector_weight = 1.0, fts_weight = ~0.67. Pick one to leave at 1.0 and adjust the other. Two-knob tuning quickly becomes incoherent.
FTS weight
Multiplier on the sparse (full-text) stream's contribution to fused rank.
Symmetric to vector weight; the seeded default is 0.8. Raise it above 0.8 when:
- Your corpus is identifier-heavy: codes, part numbers, error messages, file paths, ticket IDs.
- Your users tend to type exact strings rather than questions.
- You have measured that pure FTS results are usually more relevant than pure vector results on your evaluation set.
Set to 0 to disable the FTS stream (hybrid degrades to pure vector).
Min score
The minimum fused score (0–1) a chunk must reach to be returned to the reranker.
After RRF, every candidate has a normalized fused score in [0, 1]. Min score filters out the weak tail of that ranking.
| Value | Effect |
|---|---|
0 | No floor — every ranked candidate flows through. Useful for debugging or for mode: 'fts' callers that want every tsquery match. |
0.4 / 0.3 / 0.2 (default) | Balanced — discards obviously irrelevant matches while keeping borderline ones for the reranker to judge. Seeded as fast 0.4 / balanced 0.3 / accurate 0.2. |
0.5–0.7 | Aggressive — only confident hits survive. Use in strict-evidence flows where you would rather return zero results than a wrong one. |
> 0.7 | Very aggressive — risks empty result sets on niche queries. Pair with a graceful "no results" UX in the calling app. |
The floor applies to all three search modes (hybrid, vector, fts). Score-normalization happens before the floor, so the threshold means the same thing across modes.
Setting reference
When defaults differ, values are listed as default-fast / default-balanced /
default-accurate.
| Setting key | Type | Default | Range |
|---|---|---|---|
retrieval_dense_top_k | integer | 20 / 40 / 80 | 1 – 500 |
retrieval_sparse_top_k | integer | 20 / 40 / 80 | 1 – 500 |
retrieval_rrf_k | integer | 60 | 1 – 500 |
retrieval_vector_weight | float | 1.2 | 0 – 5 |
retrieval_fts_weight | float | 0.8 | 0 – 5 |
retrieval_min_score | float | 0.4 / 0.3 / 0.2 | 0 – 1 |
The ranges above are recommended operating bounds, not enforced limits. The preset update endpoint (PUT /api/presets/:id) stores whatever numeric value you send and the runtime reads it back as-is — no clamping, no rejection. Stay within these ranges unless you have measured a reason not to.
Related
- Reranking — what happens to the candidates this stage produces.
- Embedding — how chunks and queries become vectors in the first place.
- HNSW vector index — the index that serves the dense top-K.
- How retrieval works — the full pipeline.