Embedding & Chunking
The Embedding → Config tab in the Admin Console controls how documents are split into chunks before being embedded, and surfaces read-only information about the vector index that serves retrieval.
Chunking is the input side of embedding: every chunk the splitter emits becomes exactly one row in chunk_embeddings. Chunk boundaries therefore decide what the retriever can and cannot recall. Too-small chunks fragment meaning across retrieval calls; too-large chunks dilute similarity signal with unrelated prose.
:::tip Where to find this page Admin Console → Embedding → Config tab. The other two tabs on the same page control the embedding worker ("Process") and the active model ("Model"). :::
When changes take effect
Chunking settings apply to newly ingested content only. Existing chunks are left untouched — CoreCube never silently re-chunks behind your back.
To re-chunk existing documents against the new settings, use Re-chunk all documents (shipping in a later release). The action runs as an explicit background job and rebuilds evidence_chunks + chunk_embeddings per document.
Chunking
Strategy
How CoreCube picks chunk boundaries.
| Strategy | Boundary | Best for |
|---|---|---|
| Heading-aware | Markdown headings (#, ##, ###) | Structured docs — Confluence pages, runbooks, README files. |
| Paragraph | Blank-line paragraph breaks | PDFs, long-form articles, policy documents. |
| Fixed size | Rolling window of chunkSize tokens | Plain text, transcripts, unstructured logs. |
Default: heading-aware. Heading-aware falls back to paragraph boundaries when a document has no headings, so it is a safe default even for mixed corpora.
:::note Semantic chunking Semantic (embedding-similarity) chunking is intentionally not offered. It is substantially more expensive to compute and produces marginal retrieval gains on business documents. Heading-aware chunking captures the same structural signal at a fraction of the cost. :::
Chunk size
Target size, in tokens, for a single chunk.
| Range | Behavior |
|---|---|
| Small (128–256) | Precise recall, but meaning is often split across adjacent chunks. |
| Medium (384–768, default 512) | Balanced — enough context for a paragraph to stand on its own. |
| Large (1024–2048) | More context per chunk, less ranking granularity, more duplication on reranking. |
Default: 512 tokens. Allowed range: 64 – 2048.
Worked example. A 4,000-token document chunked at chunkSize = 512 with chunkOverlap = 50 produces approximately ceil(4000 / (512 - 50)) = 9 chunks.
Overlap
Tokens carried over from the end of one chunk into the start of the next.
Overlap prevents meaning from being lost at chunk boundaries — a sentence that would otherwise be split in half appears intact in at least one chunk. Overlap also makes chunk N-1 / N+1 neighbor expansion at retrieval time less necessary.
Default: 50 tokens. Allowed range: 0 – 256.
:::tip Rule of thumb
chunkOverlap ≈ chunkSize × 0.1 is a good starting point. Higher overlap improves boundary recall at the cost of more rows in the index.
:::
Minimum chunk size
Smallest chunk the splitter will emit. Chunks smaller than this are merged into the next one.
Prevents tiny trailing chunks ("the end." or a single bullet) that waste an embedding and rarely contribute signal.
Default: 64 tokens. Allowed range: 16 – 256.
Maximum chunk size
Hard ceiling on chunk length. Chunks are force-split at this length even if the strategy wanted to keep them together (e.g. a single very long paragraph).
Acts as a safety valve against pathological inputs — a 50,000-token blob without paragraph breaks will not become one giant chunk.
Default: 1024 tokens. Allowed range: 256 – 4096.
Cross-field rules
The following interlocks are enforced by the server (Zod validation) and echoed by the admin UI:
minChunkSize < chunkSize ≤ maxChunkSize
chunkOverlap < chunkSize
Saves that violate these are rejected with a clear error — the admin surfaces the rule inline next to the offending field.
Include section headings in embeddings
Switch. When on, the heading hierarchy of a chunk (e.g. Deployment > Staging > Prerequisites) is prepended to the chunk text before it is sent to the embedding model.
| Setting | When to turn it on |
|---|---|
| On (default) | Structured docs — Confluence, Markdown runbooks. Heading context improves retrieval on "How do I configure X for Y?" queries. |
| Off | Unstructured content — transcripts, logs, scraped HTML without reliable headings. |
:::note Only the embedding input changes The heading preamble is added to the text sent to the embedding model — it does not appear in the chunk body shown in search results or passed to the LLM. Chunk content is untouched. :::
Active index (read-only)
The Active index card shows the vector index currently serving retrieval. Structure is managed automatically — there are no editable controls here, but understanding what the panel means helps diagnose retrieval issues.
| Row | Meaning |
|---|---|
| Index family | Always HNSW. CoreCube uses HNSW for approximate-nearest-neighbor vector search via pgvector. |
| Storage | vector or halfvec (sidecar for high-dimensional models). Auto-selected based on the active model's embedding dimensions. |
| Model · dimensions | The embedding model in use and its output dimensionality (e.g. nomic-embed-text · 768d). |
| Status | ready / building / pending / failed / skipped — the live state of the HNSW index for the active model. |
Storage: vector vs halfvec
CoreCube automatically picks the best storage kind for the active model:
vector(default) — full-precision 32-bit floats. Used for embedding models up to ~2000 dimensions.halfvec(sidecar) — half-precision 16-bit floats. Used for high-dimensional models (> 2000 dimensions) where a full-precision HNSW index would exceed pgvector's per-index size limit. A halfvec sidecar column is maintained alongside the primary vector column.
You cannot switch storage manually — it is a function of the active embedding model's dimensionality. To change it, switch models on the Model tab. Switching models triggers a re-embedding pass; the new index status appears here.
Status values
| Status | Meaning |
|---|---|
ready | Index is built and serving queries. |
building | Index is being constructed (typically after a model switch or on a fresh instance). |
pending | Embeddings are still being generated — the index will be built once enough rows exist. |
failed | Index build hit an error. Check the Embedding → Process tab for worker logs. |
skipped | Index creation was bypassed (rare — only seen on very small fixtures or explicit admin override). |
Setting reference
All keys live in the shared settings table and are updated atomically by Save. Absent keys resolve to the documented defaults.
| Setting key | Type | Default | Range |
|---|---|---|---|
chunking_strategy | 'heading' | 'paragraph' | 'fixed' | 'heading' | — |
chunking_chunk_size | integer (tokens) | 512 | 64 – 2048 |
chunking_chunk_overlap | integer (tokens) | 50 | 0 – 256 |
chunking_min_chunk_size | integer (tokens) | 64 | 16 – 256 |
chunking_max_chunk_size | integer (tokens) | 1024 | 256 – 4096 |
chunk_context_mode | 'metadata' | 'none' | 'metadata' | — |
Out-of-range values are clamped to the bounds on read (with a warning logged once per process) — the typed PUT /api/embedding/config endpoint rejects them outright at the validation boundary.
Related
- Retrieval Config — the query-time counterpart to this page.
- Retrieval Pipeline — how chunks flow from ingestion to the LLM.