Retrieval Config
The Retrieval Config → Retrieval tab exposes the query-time knobs that decide what comes back from a search. Every setting on this page takes effect on the next query — there is no rebuild, no reindex, no waiting.
:::tip Where to find this page Admin Console → Retrieval Config → Retrieval tab. The Query and Result Selection tabs on the same page cover multi-query expansion and LLM-based document selection respectively. :::
:::info Defaults preserve behavior
All defaults on this page reproduce CoreCube's original hardcoded retrieval behavior byte-for-byte for hybrid and vector search modes. You can surface the tab, inspect the values, and leave them alone — nothing changes until you edit a field.
:::
Default search mode
Which search mode is used when the request does not specify one.
| Mode | When to use |
|---|---|
| Hybrid (recommended, default) | Combines vector similarity and full-text search via Reciprocal Rank Fusion. Best for mixed query styles — questions, keywords, identifiers. |
| Vector only | Cosine-similarity vector search. Strongest on semantic / paraphrased queries. No keyword boost. |
| Full-text only | PostgreSQL websearch_to_tsquery + ts_rank. Strongest on exact-phrase / identifier / filename lookups. |
Default: hybrid. Most deployments should keep this. Switch to vector or fts only if you have a measured reason — typically an analytics corpus where one mode consistently outperforms fusion.
:::note Request-level override
Clients can override the default per query by passing mode in the search request body. The admin setting is the fallback used when mode is omitted.
:::
Minimum relevance score
Normalized score floor. Results below this value are discarded.
| Value | Behavior |
|---|---|
0 | No floor — return everything the search engine produces. Useful for debugging or explicit mode: 'fts' callers that want the legacy "everything matching the tsquery" behavior. |
0.3 (default) | Balanced — filters out obviously-irrelevant matches while keeping loosely-related ones for reranking. |
0.5–0.7 | Aggressive — only high-confidence matches survive. Use in strict-evidence flows. |
Default: 0.3. Allowed range: 0 – 1.
Applies to all three search modes. Scores from FTS (ts_rank) and vector (cosine) are normalized to a 0–1 range before the floor is applied, so the threshold means the same thing regardless of mode.
:::warning FTS behavior change at release
Earlier releases of CoreCube applied minScore = 0.3 only to vector and hybrid search. fts mode returned every tsquery match unconditionally. Starting with the release that ships this page, fts mode also honors this floor. To restore the pre-release "no filter" behavior for FTS, set Minimum relevance score to 0.
:::
Hybrid fusion
The Vector stream boost, Full-text stream boost, and RRF k knobs only apply when search mode is hybrid. They are ignored for vector-only or fts-only queries.
How fusion works
CoreCube uses Reciprocal Rank Fusion (RRF) — ranks from two independent search streams are combined, not raw scores. RRF is score-scale-agnostic, which is why it works robustly across FTS (ts_rank) and vector (cosine distance), whose score distributions are incomparable.
The fused score per document is:
score = vectorWeight × 1 / (rrfK + vec_rank)
+ ftsWeight × 1 / (rrfK + fts_rank)
At vectorWeight = ftsWeight = 1.0 and rrfK = 60, the formula is byte-for-byte identical to the previous hardcoded fusion.
Vector stream boost
Multiplier on the vector stream's contribution to fused rank.
| Value | Effect |
|---|---|
0 | Disables the vector stream — hybrid degrades to effectively fts. |
1.0 (default) | Unweighted — both streams contribute equally. |
>1 | Vector-biased fusion — semantic matches rank ahead of keyword matches when both hit. |
Default: 1.0. Allowed range: 0 – 5.
Full-text stream boost
Multiplier on the FTS stream's contribution to fused rank.
Symmetric to the vector boost — raise it to favor keyword / identifier matches, lower it to favor semantic matches.
Default: 1.0. Allowed range: 0 – 5.
:::tip Tuning intuition
Change only one weight at a time. Raising vectorWeight to 1.5 is equivalent to lowering ftsWeight to ~0.67 — they are a ratio, not independent axes. Keep one stream at 1.0 and adjust the other.
:::
RRF k
Dampening constant in the RRF formula. Controls how quickly a document's rank contribution decays as its position slips.
| Value | Effect |
|---|---|
Low (10–30) | Aggressive top-of-list bias — rank 1 beats rank 5 by a large margin, rank 5 barely counts. |
60 (default) | Smooth decay across the top ~50 results. Documents appearing in both streams reliably rise. |
High (200–500) | Near-linear fusion — rank matters, but distance from the top matters much less. |
Default: 60. Allowed range: 1 – 500.
Leave at 60 unless you have a specific retrieval-quality reason to change it. The value is a well-known RRF default from the original paper and is appropriate for top-k = 10 retrieval.
Freshness floor
Clamps how aggressively stale content can be penalized by the freshness decay.
CoreCube applies an exponential freshness decay post-retrieval — recently synced content ranks higher, older content is demoted. The decay multiplier can drive a document's effective score arbitrarily close to zero if it is very old. The freshness floor caps that penalty from below.
| Value | Effect |
|---|---|
0 (default) | Floor disabled. Decay runs unclamped — equivalent to pre-release behavior. |
0.1 | Recommended opt-in. Caps the penalty at 10× — a document 3 years old ranks no worse than score × 0.1, not score × 0.01. |
0.3 | Mild decay — freshness still matters for tie-breaking, but old authoritative docs stay competitive. |
1.0 | Disables decay entirely — freshness no longer affects scoring. |
Default: 0. Allowed range: 0 – 1.
Set above 0 when you have a corpus with long-lived authoritative documents (policies, architecture decisions, legal) that should not be demoted just because they were last synced 18 months ago.
HNSW ef_search
Query-time recall vs latency trade-off for vector search.
ef_search is a pgvector HNSW parameter applied at query time — it controls how many neighbor candidates the ANN index explores before returning a result. It has no effect on the index on disk and no effect on ingestion.
| Value | Effect |
|---|---|
40 (default, pgvector default) | Fast, good recall on most corpora. |
80–200 | Higher recall — catches semantic matches that rank 41–200 in the candidate pool. Noticeably slower on large indexes. |
500+ | Near-exhaustive. Use only for offline analysis or very small corpora where latency is not a concern. |
Default: 40. Allowed range: 10 – 1000.
:::info Why the default is 40 when the max is 1000
40 is pgvector's own upstream default — see the pgvector README. It is deliberately low because ef_search is a recall/latency knob, not a "set it as high as possible" dial: recall gains plateau quickly past a few hundred, while query latency keeps climbing. Production tuning almost always lands between 40 and 200. The 1000 ceiling is generous headroom for offline analysis or ground-truth recall measurement on small fixtures — not a value you should be running under traffic.
:::
:::note How it is applied
CoreCube sets hnsw.ef_search as a transaction-local Postgres parameter on every vector-touching query — it never leaks across connections or requests. Changing the value here takes effect on the next query; no restart or reindex required.
:::
:::tip When to raise it
Raise ef_search when you see semantically obvious matches missing from results despite having good embeddings. Start at 100, measure p95 query latency, and stop when recall stops improving.
:::
How modes interact with the rest of the pipeline
The Retrieval tab only governs the retrieval stage. The same query also passes through:
- Query expansion — the Query tab may rewrite the user's query into multiple variants before search runs.
- Retrieval — this page. Returns a candidate list.
- Reranking and freshness — candidates are freshness-weighted and optionally cross-encoder reranked.
- Result selection — the Result Selection tab may ask an LLM to pick the most relevant documents from the candidate list.
Minimum relevance score and freshness floor apply at stage 2. Hybrid fusion applies at stage 2. ef_search applies at stage 2.
See Retrieval Pipeline for the end-to-end flow.
Setting reference
| Setting key | Type | Default | Range |
|---|---|---|---|
retrieval_default_search_mode | 'hybrid' | 'vector' | 'fts' | 'hybrid' | — |
retrieval_min_score | float | 0.3 | 0 – 1 |
retrieval_vector_weight | float | 1.0 | 0 – 5 |
retrieval_fts_weight | float | 1.0 | 0 – 5 |
retrieval_rrf_k | integer | 60 | 1 – 500 |
retrieval_freshness_floor | float | 0 | 0 – 1 |
retrieval_hnsw_ef_search | integer | 40 | 10 – 1000 |
Out-of-range values are clamped to the bounds on read (with a warning logged once per process) — the typed PUT /api/retrieval/config endpoint rejects them outright at the validation boundary.
Audit
Every change to retrieval config is recorded in the audit log as an retrieval_config_updated event with a nested diff (e.g. "retrieval.vectorWeight: 1.0 → 1.5"). Chunking changes are logged as embedding_config_updated.
Related
- Embedding & Chunking — the ingestion-side counterpart to this page.
- Retrieval Pipeline — how retrieval composes with reranking, context assembly, and LLM generation.