Skip to main content

Embedding & Chunking

The Embedding → Config tab in the Admin Console controls how documents are split into chunks before being embedded, and surfaces read-only information about the vector index that serves retrieval.

Chunking is the input side of embedding: every chunk the splitter emits becomes exactly one row in chunk_embeddings. Chunk boundaries therefore decide what the retriever can and cannot recall. Too-small chunks fragment meaning across retrieval calls; too-large chunks dilute similarity signal with unrelated prose.

:::tip Where to find this page Admin Console → Embedding → Config tab. The other two tabs on the same page control the embedding worker ("Process") and the active model ("Model"). :::

When changes take effect

Chunking settings apply to newly ingested content only. Existing chunks are left untouched — CoreCube never silently re-chunks behind your back.

To re-chunk existing documents against the new settings, use Re-chunk all documents (shipping in a later release). The action runs as an explicit background job and rebuilds evidence_chunks + chunk_embeddings per document.


Chunking

Strategy

How CoreCube picks chunk boundaries.

StrategyBoundaryBest for
Heading-awareMarkdown headings (#, ##, ###)Structured docs — Confluence pages, runbooks, README files.
ParagraphBlank-line paragraph breaksPDFs, long-form articles, policy documents.
Fixed sizeRolling window of chunkSize tokensPlain text, transcripts, unstructured logs.

Default: heading-aware. Heading-aware falls back to paragraph boundaries when a document has no headings, so it is a safe default even for mixed corpora.

:::note Semantic chunking Semantic (embedding-similarity) chunking is intentionally not offered. It is substantially more expensive to compute and produces marginal retrieval gains on business documents. Heading-aware chunking captures the same structural signal at a fraction of the cost. :::

Chunk size

Target size, in tokens, for a single chunk.

RangeBehavior
Small (128–256)Precise recall, but meaning is often split across adjacent chunks.
Medium (384–768, default 512)Balanced — enough context for a paragraph to stand on its own.
Large (1024–2048)More context per chunk, less ranking granularity, more duplication on reranking.

Default: 512 tokens. Allowed range: 642048.

Worked example. A 4,000-token document chunked at chunkSize = 512 with chunkOverlap = 50 produces approximately ceil(4000 / (512 - 50)) = 9 chunks.

Overlap

Tokens carried over from the end of one chunk into the start of the next.

Overlap prevents meaning from being lost at chunk boundaries — a sentence that would otherwise be split in half appears intact in at least one chunk. Overlap also makes chunk N-1 / N+1 neighbor expansion at retrieval time less necessary.

Default: 50 tokens. Allowed range: 0256.

:::tip Rule of thumb chunkOverlap ≈ chunkSize × 0.1 is a good starting point. Higher overlap improves boundary recall at the cost of more rows in the index. :::

Minimum chunk size

Smallest chunk the splitter will emit. Chunks smaller than this are merged into the next one.

Prevents tiny trailing chunks ("the end." or a single bullet) that waste an embedding and rarely contribute signal.

Default: 64 tokens. Allowed range: 16256.

Maximum chunk size

Hard ceiling on chunk length. Chunks are force-split at this length even if the strategy wanted to keep them together (e.g. a single very long paragraph).

Acts as a safety valve against pathological inputs — a 50,000-token blob without paragraph breaks will not become one giant chunk.

Default: 1024 tokens. Allowed range: 2564096.

Cross-field rules

The following interlocks are enforced by the server (Zod validation) and echoed by the admin UI:

minChunkSize < chunkSize ≤ maxChunkSize
chunkOverlap < chunkSize

Saves that violate these are rejected with a clear error — the admin surfaces the rule inline next to the offending field.

Include section headings in embeddings

Switch. When on, the heading hierarchy of a chunk (e.g. Deployment > Staging > Prerequisites) is prepended to the chunk text before it is sent to the embedding model.

SettingWhen to turn it on
On (default)Structured docs — Confluence, Markdown runbooks. Heading context improves retrieval on "How do I configure X for Y?" queries.
OffUnstructured content — transcripts, logs, scraped HTML without reliable headings.

:::note Only the embedding input changes The heading preamble is added to the text sent to the embedding model — it does not appear in the chunk body shown in search results or passed to the LLM. Chunk content is untouched. :::


Active index (read-only)

The Active index card shows the vector index currently serving retrieval. Structure is managed automatically — there are no editable controls here, but understanding what the panel means helps diagnose retrieval issues.

RowMeaning
Index familyAlways HNSW. CoreCube uses HNSW for approximate-nearest-neighbor vector search via pgvector.
Storagevector or halfvec (sidecar for high-dimensional models). Auto-selected based on the active model's embedding dimensions.
Model · dimensionsThe embedding model in use and its output dimensionality (e.g. nomic-embed-text · 768d).
Statusready / building / pending / failed / skipped — the live state of the HNSW index for the active model.

Storage: vector vs halfvec

CoreCube automatically picks the best storage kind for the active model:

  • vector (default) — full-precision 32-bit floats. Used for embedding models up to ~2000 dimensions.
  • halfvec (sidecar) — half-precision 16-bit floats. Used for high-dimensional models (> 2000 dimensions) where a full-precision HNSW index would exceed pgvector's per-index size limit. A halfvec sidecar column is maintained alongside the primary vector column.

You cannot switch storage manually — it is a function of the active embedding model's dimensionality. To change it, switch models on the Model tab. Switching models triggers a re-embedding pass; the new index status appears here.

Status values

StatusMeaning
readyIndex is built and serving queries.
buildingIndex is being constructed (typically after a model switch or on a fresh instance).
pendingEmbeddings are still being generated — the index will be built once enough rows exist.
failedIndex build hit an error. Check the Embedding → Process tab for worker logs.
skippedIndex creation was bypassed (rare — only seen on very small fixtures or explicit admin override).

Setting reference

All keys live in the shared settings table and are updated atomically by Save. Absent keys resolve to the documented defaults.

Setting keyTypeDefaultRange
chunking_strategy'heading' | 'paragraph' | 'fixed''heading'
chunking_chunk_sizeinteger (tokens)512642048
chunking_chunk_overlapinteger (tokens)500256
chunking_min_chunk_sizeinteger (tokens)6416256
chunking_max_chunk_sizeinteger (tokens)10242564096
chunk_context_mode'metadata' | 'none''metadata'

Out-of-range values are clamped to the bounds on read (with a warning logged once per process) — the typed PUT /api/embedding/config endpoint rejects them outright at the validation boundary.


We use cookies for analytics to improve our website. More information in our Privacy Policy.