Embedding & Chunking

The Embedding → Config tab in the Admin Console controls how documents are split into chunks before being embedded, and surfaces read-only information about the vector index that serves retrieval.

Chunking is the input side of embedding: every chunk the splitter emits becomes exactly one row in chunk_embeddings. Chunk boundaries therefore decide what the retriever can and cannot recall. Too-small chunks fragment meaning across retrieval calls; too-large chunks dilute similarity signal with unrelated prose.

:::tip Where to find this page Admin Console → Embedding → Config tab. The other two tabs on the same page control the embedding worker ("Process") and the active model ("Model"). :::

When changes take effect

Chunking settings apply to newly ingested content only. Existing chunks are left untouched — CoreCube never silently re-chunks behind your back.

To re-chunk existing documents against the new settings, use Re-chunk all documents (shipping in a later release). The action runs as an explicit background job and rebuilds evidence_chunks + chunk_embeddings per document.

Chunking

Strategy

How CoreCube picks chunk boundaries.

Strategy	Boundary	Best for
Heading-aware	Markdown headings (`#`, `##`, `###`)	Structured docs — Confluence pages, runbooks, README files.
Paragraph	Blank-line paragraph breaks	PDFs, long-form articles, policy documents.
Fixed size	Rolling window of `chunkSize` tokens	Plain text, transcripts, unstructured logs.

Default: heading-aware. Heading-aware falls back to paragraph boundaries when a document has no headings, so it is a safe default even for mixed corpora.

:::note Semantic chunking Semantic (embedding-similarity) chunking is intentionally not offered. It is substantially more expensive to compute and produces marginal retrieval gains on business documents. Heading-aware chunking captures the same structural signal at a fraction of the cost. :::

Chunk size

Target size, in tokens, for a single chunk.

Range	Behavior
Small (128–256)	Precise recall, but meaning is often split across adjacent chunks.
Medium (384–768, default 512)	Balanced — enough context for a paragraph to stand on its own.
Large (1024–2048)	More context per chunk, less ranking granularity, more duplication on reranking.

Default: 512 tokens. Allowed range: 64 – 2048.

Worked example. A 4,000-token document chunked at chunkSize = 512 with chunkOverlap = 50 produces approximately ceil(4000 / (512 - 50)) = 9 chunks.

Overlap

Tokens carried over from the end of one chunk into the start of the next.

Overlap prevents meaning from being lost at chunk boundaries — a sentence that would otherwise be split in half appears intact in at least one chunk. Overlap also makes chunk N-1 / N+1 neighbor expansion at retrieval time less necessary.

Default: 50 tokens. Allowed range: 0 – 256.

:::tip Rule of thumb chunkOverlap ≈ chunkSize × 0.1 is a good starting point. Higher overlap improves boundary recall at the cost of more rows in the index. :::

Minimum chunk size

Smallest chunk the splitter will emit. Chunks smaller than this are merged into the next one.

Prevents tiny trailing chunks ("the end." or a single bullet) that waste an embedding and rarely contribute signal.

Default: 64 tokens. Allowed range: 16 – 256.

Maximum chunk size

Hard ceiling on chunk length. Chunks are force-split at this length even if the strategy wanted to keep them together (e.g. a single very long paragraph).

Acts as a safety valve against pathological inputs — a 50,000-token blob without paragraph breaks will not become one giant chunk.

Default: 1024 tokens. Allowed range: 256 – 4096.

Cross-field rules

The following interlocks are enforced by the server (Zod validation) and echoed by the admin UI:

minChunkSize  <  chunkSize  ≤  maxChunkSize
chunkOverlap  <  chunkSize

Saves that violate these are rejected with a clear error — the admin surfaces the rule inline next to the offending field.

Include section headings in embeddings

Switch. When on, the heading hierarchy of a chunk (e.g. Deployment > Staging > Prerequisites) is prepended to the chunk text before it is sent to the embedding model.

Setting	When to turn it on
On (default)	Structured docs — Confluence, Markdown runbooks. Heading context improves retrieval on "How do I configure X for Y?" queries.
Off	Unstructured content — transcripts, logs, scraped HTML without reliable headings.

:::note Only the embedding input changes The heading preamble is added to the text sent to the embedding model — it does not appear in the chunk body shown in search results or passed to the LLM. Chunk content is untouched. :::

Active index (read-only)

The Active index card shows the vector index currently serving retrieval. Structure is managed automatically — there are no editable controls here, but understanding what the panel means helps diagnose retrieval issues.

Row	Meaning
Index family	Always `HNSW`. CoreCube uses HNSW for approximate-nearest-neighbor vector search via pgvector.
Storage	`vector` or `halfvec (sidecar for high-dimensional models)`. Auto-selected based on the active model's embedding dimensions.
Model · dimensions	The embedding model in use and its output dimensionality (e.g. `nomic-embed-text · 768d`).
Status	`ready` / `building` / `pending` / `failed` / `skipped` — the live state of the HNSW index for the active model.

Storage: `vector` vs `halfvec`

CoreCube automatically picks the best storage kind for the active model:

vector (default) — full-precision 32-bit floats. Used for embedding models up to ~2000 dimensions.
halfvec (sidecar) — half-precision 16-bit floats. Used for high-dimensional models (> 2000 dimensions) where a full-precision HNSW index would exceed pgvector's per-index size limit. A halfvec sidecar column is maintained alongside the primary vector column.

You cannot switch storage manually — it is a function of the active embedding model's dimensionality. To change it, switch models on the Model tab. Switching models triggers a re-embedding pass; the new index status appears here.

Status values

Status	Meaning
`ready`	Index is built and serving queries.
`building`	Index is being constructed (typically after a model switch or on a fresh instance).
`pending`	Embeddings are still being generated — the index will be built once enough rows exist.
`failed`	Index build hit an error. Check the Embedding → Process tab for worker logs.
`skipped`	Index creation was bypassed (rare — only seen on very small fixtures or explicit admin override).

Setting reference

All keys live in the shared settings table and are updated atomically by Save. Absent keys resolve to the documented defaults.

Setting key	Type	Default	Range
`chunking_strategy`	`'heading'` \| `'paragraph'` \| `'fixed'`	`'heading'`	—
`chunking_chunk_size`	integer (tokens)	`512`	`64` – `2048`
`chunking_chunk_overlap`	integer (tokens)	`50`	`0` – `256`
`chunking_min_chunk_size`	integer (tokens)	`64`	`16` – `256`
`chunking_max_chunk_size`	integer (tokens)	`1024`	`256` – `4096`
`chunk_context_mode`	`'metadata'` \| `'none'`	`'metadata'`	—

Out-of-range values are clamped to the bounds on read (with a warning logged once per process) — the typed PUT /api/embedding/config endpoint rejects them outright at the validation boundary.

Retrieval Config — the query-time counterpart to this page.
Retrieval Pipeline — how chunks flow from ingestion to the LLM.

When changes take effect​

Chunking​

Strategy​

Chunk size​

Overlap​

Minimum chunk size​

Maximum chunk size​

Cross-field rules​

Include section headings in embeddings​

Active index (read-only)​

Storage: vector vs halfvec​

Status values​

Setting reference​

Related​