HNSW vector index
CoreCube uses HNSW (Hierarchical Navigable Small World) as its approximate-nearest-neighbor (ANN) index for dense vector search, via the pgvector extension on Postgres.
HNSW is a graph index: each vector is a node, and edges connect each node to its M nearest neighbors at multiple layers of a hierarchy. A search starts at the top layer, greedily walks toward the query, drops to a denser layer, and repeats. Recall and latency are both functions of how dense the graph is and how aggressively you walk it.
The three knobs on this page govern those dimensions:
M— graph degree at build time. Sets the upper bound on recall the index can ever achieve.efConstruction— search effort during inserts. Refines the graph quality.efSearch— search effort per query. Trades latency for recall on every read.
The first two shape the index on disk — changing them requires a rebuild. The third is per-query and can be retuned freely without touching disk.
The defaults are pgvector's published defaults, tuned by the pgvector team across hundreds of benchmarks. Change a value here only when you have measured retrieval quality on your own corpus and identified the index as the bottleneck. Premature HNSW tuning is one of the easiest ways to make retrieval slower without making it better.
M
The maximum number of bidirectional connections per node in the HNSW graph (excluding the bottom layer, which uses 2 × M).
M is the structural quality of the index. Higher M means a denser graph with more redundant paths to any node, which means higher achievable recall. It also means more memory and a longer build.
| Value | Effect |
|---|---|
8–12 | Lean graph. Sufficient for small corpora (< 100k chunks) where most queries hit obvious nearest neighbors. Not recommended at scale. |
16 (pgvector default) | Healthy default for general-purpose retrieval up to a few million chunks. |
24–32 | High-quality retrieval. Worth the memory cost when you have measured recall ceilings at M = 16. |
48–64 | Accuracy-critical workloads (legal e-discovery, regulatory search). Memory cost grows linearly; build time grows superlinearly. |
> 64 | Diminishing returns and very large memory cost. Almost never the right answer. |
Memory cost intuition: the index holds roughly M × num_chunks × 8 bytes of edge data on top of the vectors themselves. For 10M chunks at M = 32, that is ~2.5 GB of graph metadata — usually fine; just budget for it.
Changing M requires a full vector index rebuild. CoreCube runs the rebuild as a background job; the previous index continues to serve queries until the new one is ready. No downtime, just a transition window.
efConstruction
The dynamic candidate list size used during insert. Controls how thoroughly each new vector explores the existing graph before being connected.
Build-time effort. Higher values produce a higher-quality graph (better recall ceiling at any efSearch) at the cost of slower ingestion.
| Value | Effect |
|---|---|
40–80 | Fast ingestion, lower-quality graph. Use only for throwaway indexes or when ingest speed dominates and you can re-tune later. |
100 (default) | Standard balance. Suitable for most workloads. |
200 | Higher-quality graph. Roughly 2× ingestion cost vs 100. Pays off when retrieval recall is critical. |
400+ | Diminishing returns. Worth measuring before committing. |
efConstruction only affects newly inserted chunks. Changing the value does not retroactively improve already-indexed vectors — those keep whatever connections they got at insert time. To apply a higher value to existing chunks, trigger an index rebuild.
Constraint: efConstruction ≥ M. The form enforces this implicitly via the field minimums.
efSearch
The dynamic candidate list size used during search. Controls how many graph nodes the query explores before returning a result.
Per-query knob. Doesn't touch disk, doesn't require a rebuild, takes effect immediately. This is the only HNSW value you should expect to tune in production traffic.
| Value | Effect |
|---|---|
20–40 | Fast queries, lower recall. Use only when latency is the dominant constraint and you have measured recall as acceptable. |
64 (default) | Strong recall on most corpora; sub-10 ms typical query latency for indexes up to a few million vectors. |
128 | Higher recall — catches semantic matches ranked 65–128 that the default would miss. ~2× query latency. |
256–512 | Near-exhaustive. Use when measured recall at 64 is leaving relevant chunks behind. Latency grows roughly linearly above ~128. |
> 512 | Approaching brute force. Use only for offline analysis or ground-truth recall measurement on small fixtures — never under traffic. |
When to raise it: you observe semantically obvious matches missing from results despite known-good embeddings. Start at 128, measure p95 query latency, and stop when recall stops improving.
How it is applied: CoreCube sets hnsw.ef_search as a transaction-local Postgres parameter on every vector-touching query — it never leaks across connections or requests. Changing the value here is reflected on the next query.
Setting reference
When defaults differ, values are listed as default-fast / default-balanced /
default-accurate.
| Setting key | Type | Default | Range |
|---|---|---|---|
retrieval_hnsw_m | integer | 16 / 32 / 48 | 2 – 128 |
retrieval_hnsw_ef_construction | integer | 100 / 200 / 400 | 4 – 2000 |
retrieval_hnsw_ef_search | integer | 64 / 128 / 256 | 10 – 1000 |
These are recommended operating bounds. The build-time parameters (M, efConstruction) are clamped to their bounds when the index is built; efSearch is applied per query as supplied. The preset update endpoint does not reject out-of-range values, so stay within these ranges unless you have measured a reason not to.
When to rebuild vs when to ship
| Change | Effect on existing data | Apply on next query? |
|---|---|---|
efSearch only | None — purely query-time. | Yes — instant. |
efConstruction only | Existing chunks unaffected; only new inserts use the new value. | New inserts: yes. Existing chunks: only on rebuild. |
M | Index structure must be rebuilt — old graph is invalid for the new M. | Only after rebuild completes (background job). |
| Embedding model or dimensions | Every vector must be regenerated and the index rebuilt. | Only after re-embed + rebuild completes. |
The rebuild job is monitorable from the Admin Console; queries continue to serve from the previous index until the new one is ready, so there is no read downtime.
Related
- Embedding — produces the vectors this index holds.
- Retrieval — the dense top-K query that runs against this index.
- pgvector documentation on HNSW — upstream reference.