Skip to main content

HNSW vector index

CoreCube uses HNSW (Hierarchical Navigable Small World) as its approximate-nearest-neighbor (ANN) index for dense vector search, via the pgvector extension on Postgres.

HNSW is a graph index: each vector is a node, and edges connect each node to its M nearest neighbors at multiple layers of a hierarchy. A search starts at the top layer, greedily walks toward the query, drops to a denser layer, and repeats. Recall and latency are both functions of how dense the graph is and how aggressively you walk it.

The three knobs on this page govern those dimensions:

  • M — graph degree at build time. Sets the upper bound on recall the index can ever achieve.
  • efConstruction — search effort during inserts. Refines the graph quality.
  • efSearch — search effort per query. Trades latency for recall on every read.

The first two shape the index on disk — changing them requires a rebuild. The third is per-query and can be retuned freely without touching disk.

You probably don't need to touch this page

The defaults are pgvector's published defaults, tuned by the pgvector team across hundreds of benchmarks. Change a value here only when you have measured retrieval quality on your own corpus and identified the index as the bottleneck. Premature HNSW tuning is one of the easiest ways to make retrieval slower without making it better.


M

The maximum number of bidirectional connections per node in the HNSW graph (excluding the bottom layer, which uses 2 × M).

M is the structural quality of the index. Higher M means a denser graph with more redundant paths to any node, which means higher achievable recall. It also means more memory and a longer build.

ValueEffect
8–12Lean graph. Sufficient for small corpora (< 100k chunks) where most queries hit obvious nearest neighbors. Not recommended at scale.
16 (pgvector default)Healthy default for general-purpose retrieval up to a few million chunks.
24–32High-quality retrieval. Worth the memory cost when you have measured recall ceilings at M = 16.
48–64Accuracy-critical workloads (legal e-discovery, regulatory search). Memory cost grows linearly; build time grows superlinearly.
> 64Diminishing returns and very large memory cost. Almost never the right answer.

Memory cost intuition: the index holds roughly M × num_chunks × 8 bytes of edge data on top of the vectors themselves. For 10M chunks at M = 32, that is ~2.5 GB of graph metadata — usually fine; just budget for it.

Changing M requires a full vector index rebuild. CoreCube runs the rebuild as a background job; the previous index continues to serve queries until the new one is ready. No downtime, just a transition window.


efConstruction

The dynamic candidate list size used during insert. Controls how thoroughly each new vector explores the existing graph before being connected.

Build-time effort. Higher values produce a higher-quality graph (better recall ceiling at any efSearch) at the cost of slower ingestion.

ValueEffect
40–80Fast ingestion, lower-quality graph. Use only for throwaway indexes or when ingest speed dominates and you can re-tune later.
100 (default)Standard balance. Suitable for most workloads.
200Higher-quality graph. Roughly 2× ingestion cost vs 100. Pays off when retrieval recall is critical.
400+Diminishing returns. Worth measuring before committing.

efConstruction only affects newly inserted chunks. Changing the value does not retroactively improve already-indexed vectors — those keep whatever connections they got at insert time. To apply a higher value to existing chunks, trigger an index rebuild.

Constraint: efConstruction ≥ M. The form enforces this implicitly via the field minimums.


efSearch

The dynamic candidate list size used during search. Controls how many graph nodes the query explores before returning a result.

Per-query knob. Doesn't touch disk, doesn't require a rebuild, takes effect immediately. This is the only HNSW value you should expect to tune in production traffic.

ValueEffect
20–40Fast queries, lower recall. Use only when latency is the dominant constraint and you have measured recall as acceptable.
64 (default)Strong recall on most corpora; sub-10 ms typical query latency for indexes up to a few million vectors.
128Higher recall — catches semantic matches ranked 65–128 that the default would miss. ~2× query latency.
256–512Near-exhaustive. Use when measured recall at 64 is leaving relevant chunks behind. Latency grows roughly linearly above ~128.
> 512Approaching brute force. Use only for offline analysis or ground-truth recall measurement on small fixtures — never under traffic.

When to raise it: you observe semantically obvious matches missing from results despite known-good embeddings. Start at 128, measure p95 query latency, and stop when recall stops improving.

How it is applied: CoreCube sets hnsw.ef_search as a transaction-local Postgres parameter on every vector-touching query — it never leaks across connections or requests. Changing the value here is reflected on the next query.


Setting reference

When defaults differ, values are listed as default-fast / default-balanced / default-accurate.

Setting keyTypeDefaultRange
retrieval_hnsw_minteger16 / 32 / 482128
retrieval_hnsw_ef_constructioninteger100 / 200 / 40042000
retrieval_hnsw_ef_searchinteger64 / 128 / 256101000

These are recommended operating bounds. The build-time parameters (M, efConstruction) are clamped to their bounds when the index is built; efSearch is applied per query as supplied. The preset update endpoint does not reject out-of-range values, so stay within these ranges unless you have measured a reason not to.


When to rebuild vs when to ship

ChangeEffect on existing dataApply on next query?
efSearch onlyNone — purely query-time.Yes — instant.
efConstruction onlyExisting chunks unaffected; only new inserts use the new value.New inserts: yes. Existing chunks: only on rebuild.
MIndex structure must be rebuilt — old graph is invalid for the new M.Only after rebuild completes (background job).
Embedding model or dimensionsEvery vector must be regenerated and the index rebuilt.Only after re-embed + rebuild completes.

The rebuild job is monitorable from the Admin Console; queries continue to serve from the previous index until the new one is ready, so there is no read downtime.


We use cookies for analytics to improve our website. More information in our Privacy Policy.