Embedding
The embedding stage turns every chunk and every query into a dense vector. The choices here decide what "similar" means inside CoreCube — the model defines the semantic space, the dimensions define the geometry, and the encoding mode defines how queries and documents relate to each other.
Changing any field on this page typically requires re-embedding existing content. Switching to a different model invalidates every vector in the index; CoreCube re-embeds in the background, and the old vectors continue to serve queries until the new ones are ready.
Model
The embedding model used to encode chunks at ingest time and queries at search time.
The same model must be used for both sides of the comparison — you cannot embed documents with nomic-embed-text and queries with text-embedding-3-large. Switching the model is therefore the single most expensive change in CoreCube: every chunk in chunk_embeddings becomes invalid and must be regenerated.
Choosing a model
| Class | Examples | When to use |
|---|---|---|
| Local | nomic-embed-text (768d), bge-m3 (1024d), mxbai-embed-large (1024d) | Air-gapped deployments, GDPR-strict environments, cost control. No outbound calls. Slower than cloud models on the same hardware. |
| Cloud (OpenAI) | text-embedding-3-small (1536d, Matryoshka), text-embedding-3-large (3072d, Matryoshka) | Best-in-class general retrieval. Matryoshka — you can request a smaller vector at a small recall cost (see Dimensions). |
| Cloud (Voyage) | voyage-3 (1024d), voyage-code-3 (1024d, code-tuned) | Strong on long-context, code, and asymmetric query/doc encoding (see Input mode). |
| Cloud (Cohere) | embed-english-v3.0 (1024d), embed-multilingual-v3.0 (1024d) | Multilingual corpora; good asymmetric query/doc encoding. |
What happens on change
- CoreCube freezes the active index for new writes against the old model.
- A background re-embed job walks every
evidence_chunksrow and produces fresh vectors against the new model. - The HNSW index is rebuilt as the new vectors arrive.
- Once the rebuild completes, the new model becomes the active index and old vectors are dropped.
Queries continue to be served against the previous model until the rebuild completes — there is no downtime, just a lag during which the new model is not yet available.
Dimensions
The size of each vector emitted by the embedding model.
Dimensions are usually locked to the model's native output. A field like nomic-embed-text always emits 768d — there is no choice. The form disables this field for fixed-size models and shows the native value read-only.
For Matryoshka models (text-embedding-3-small, text-embedding-3-large, voyage-3-lite, nomic-embed-v2), the model is trained to produce useful vectors at multiple truncated lengths. You can request a smaller vector at index time:
| Model | Native | Useful truncation points | Recall cost vs native |
|---|---|---|---|
text-embedding-3-small | 1536 | 256, 512, 768 | ~1–3% drop at 768 |
text-embedding-3-large | 3072 | 256, 512, 1024, 2048 | ~2–5% drop at 1024 |
nomic-embed-v2 | 768 | 128, 256, 512 | ~1–2% drop at 512 |
Why truncate? Storage and HNSW memory footprint scale roughly linearly with dimensions. A 1024d index on 10M chunks is ~30 GB; the same corpus at 256d is ~7.5 GB. For corpora that fit in RAM at a smaller dimension but spill to disk at the native one, the recall trade is usually worth it.
Truncated vectors are not interchangeable with native ones. Switching truncation level invalidates the index just like switching models — every chunk must be re-embedded. Pick a value, measure recall on a query set, then commit.
Input mode
Whether to send a query/document role hint to the embedding API.
Some embedding models are trained with asymmetric encoding — the model produces a different vector for the same text depending on whether it is being indexed as a document or used as a query. When the model supports this and you flag the role, retrieval improves measurably (typically 3–8% on standard IR benchmarks).
| Mode | Behavior | When to use |
|---|---|---|
auto | Use the provider's default — typically unified for local models, separate for asymmetric cloud models. | Safe default. Switch only if you know your model behaves better with a specific mode. |
unified | Send both queries and documents through the same encoder path with no role hint. | Required for nomic-embed-text (v1), BGE-M3, mxbai-embed-large and most local models. |
separate | Send a role token (query: / passage: or the provider's equivalent) so the model encodes them asymmetrically. | Voyage, Cohere, Jina, and nomic-embed-v2 all support — and benefit from — separate query vs document mode. |
If you select separate against a model that does not support it, the role hint is silently ignored by most providers. The result is no harm done — but no benefit either.
Compression
The on-disk storage format for vectors. Coming soon — value is stored but ignored by the runtime today.
Vector storage is the dominant cost at scale. A 10M-chunk index at 1024d in full float32 is ~40 GB; the same index in 1-bit binary is ~1.3 GB. CoreCube will support several compression formats once the storage paths land:
| Format | Bits per dim | Footprint vs none | Recall cost |
|---|---|---|---|
none | 32 (float32) | 100% (baseline) | None — exact vectors. |
halfvec | 16 (float16) | ~50% | Negligible — typically <0.5% recall drop on standard benchmarks. |
sq8 | 8 (int8) | ~25% | Small — typically 1–3% recall drop, depends on model. |
binary | 1 | ~3% | Significant — 5–15% recall drop. Pair with reranking. |
In the interim, the on-disk index format is chosen by the runtime independently of this field: models whose vectors exceed pgvector's HNSW size limit are routed to a halfvec sidecar, everything else stays vector (float32). The preset field itself has no single default — the seeded presets set it per tier (the fast preset seeds halfvec; the balanced and accurate presets seed none) — and it is ignored by the runtime today regardless.
Normalize
L2-normalize embeddings before they are stored.
When on, each vector is rescaled so its Euclidean length is 1. This is required for cosine distance to behave correctly when the model does not normalize internally — without it, a long document and a short document with the same direction would score differently against the same query.
| Model class | Recommended setting | Why |
|---|---|---|
OpenAI text-embedding-* | On | OpenAI does not guarantee unit-length output. |
| Voyage, Cohere, Jina | On | Same — keeps cosine consistent across providers. |
nomic-embed-text, BGE | Either | These models emit unit-length vectors by construction; normalizing again is a no-op but harmless. |
Mistral embed | On | Outputs are not unit-length. |
Default: on. Leaving it on for an already-normalized model costs nothing. Turning it off for a model that needs it silently degrades retrieval quality — leave on unless you have a measured reason.
Setting reference
| Setting key | Type | Default | Notes |
|---|---|---|---|
embedding_model | string (catalog id) | active preset default | See Choosing a model. |
embedding_dimensions | integer | model native | Locked unless the model is Matryoshka. |
embedding_input_mode | 'auto' | 'unified' | 'separate' | 'auto' | Asymmetric mode is silently ignored by models that do not support it. |
embedding_compression | 'none' | 'halfvec' | 'sq8' | 'binary' | 'halfvec' (fast), else 'none' | Coming soon — value is stored but not honored by the runtime today. |
embedding_normalize | boolean | true | Required for cosine distance with non-normalizing models. |
Related
- Chunking — what gets fed into the embedding model.
- HNSW vector index — how the vectors get indexed for fast nearest-neighbor search.
- Retrieval — how the embedded query meets the embedded chunks at query time.