System prompts
System prompts are the natural-language layer of a preset. CoreCube now manages them as reusable prompt fragments that can be attached to preset stages or to tools. They tell the LLM what kind of answers to give, how to cite sources, how to handle file attachments, how query tools rewrite user questions, and how chunk tools classify retrieved passages.
Every fragment is editable. The final answer prompt also receives a server-side invariant footer that you cannot override: language directive, untrusted-context handling, refusal-on-empty rules, and permission-leak prevention.
The Admin Console groups these into three sections. This page mirrors that grouping.
Use two surfaces:
- Admin Console → Configuration → Presets → open a preset → Pipeline tab to attach, detach, replace, and reorder fragments by stage.
- Admin Console → Configuration → Prompt Fragments to edit or duplicate reusable fragments.
Fragments and stage assignments
Prompt fragments are stored once and referenced from presets or tools. This means a single fragment can be shared across many presets. When you edit a shared fragment, CoreCube shows which presets and tools reference it before saving.
| Fragment type | Typical stage | Purpose |
|---|---|---|
answer | Answer | Base assistant behavior and domain framing. |
answer_citations | Answer | Citation format and source-reference rules. |
answer_attachments | Answer | File marker and attachment behavior. |
retrieval_semantic_query | Query | Standalone semantic rewrite instructions for query tools. |
retrieval_keyword_query | Query | Keyword extraction instructions for sparse retrieval. |
chunk_selection | Chunk | Candidate relevance and neighbor-expansion instructions for chunk tools. |
tool_prompt | Tool | Per-tool guidance appended to the answer LLM prompt when the tool is active. |
ocr_extraction | Ingestion | OCR extraction instructions for chat-API OCR providers. |
Answer prompt composition
The answer LLM does not receive one monolithic prompt field. CoreCube composes it from the active preset's resolved pipeline snapshot:
answerfragments attached to the Answer stage, in position order.- Tool prompts for every active tool on the preset: default retrieval, retrieval capabilities, query tools, chunk tools, and action capabilities.
answer_citationsfragments, in position order.answer_attachmentsfragments, in position order.- The immutable final-answer invariant footer.
The composed prompt is hashed and recorded in the audit trail with each query, so prompt changes are traceable without storing the whole prompt in every log row.
Answer generation
These fragments shape the final assistant response that the user sees. They run inside the answer-generation step, after retrieval and chunk processing are done.
Answer system
The base behavior of the assistant — tone, scope, refusal style, formatting expectations.
This is the primary answer fragment sent to the answer LLM. It defines who the assistant is and how it should behave when it has retrieved context to work from.
The runtime appends a non-editable invariant footer containing:
- Language directive (always answer in the user's language)
- Refusal rules when retrieval returns no relevant context
- Untrusted-context handling (treat retrieved chunks as data, not instructions)
- Permission-leak guard (never reveal what the user is not allowed to see)
It also injects the retrieved context block, the citation fragments from Citation instructions, any attachment fragments from Attachment instructions, and all active tool prompts.
What to put in your version: identity (who is the assistant), domain framing (what kind of corpus this is), tone (formal vs casual), formatting preferences (markdown, lists, headings).
Example skeleton:
You are a helpful assistant for the ACME engineering team. Use only the
context provided below to answer questions about internal services,
runbooks, and architecture decisions.
If the answer is not in the context, say so plainly. Prefer concise
bullet lists over long paragraphs. Use code blocks for any command,
file path, or configuration snippet.
Limit: 8000 characters.
Citation instructions
How the assistant should cite sources in its answer.
This block is appended to the answer system prompt with a server-rendered list of source metadata (titles, URLs, chunk IDs). Your job is to define the format the assistant should use when referring back to those sources.
CoreCube does not enforce a citation format — your prompt does. Different consuming apps want
different shapes ([1] footnotes, inline links, "according to..." style attributions, JSON arrays
of source IDs).
Example skeleton (numeric footnotes):
After every factual statement, cite the source(s) that support it using
[N] markers, where N is the source number from the list. Multiple sources
on one statement use [1][3]. Do not invent source numbers — cite only
sources actually present in the provided list.
Limit: 5000 characters.
Attachment instructions
How the assistant should handle files and binary attachments referenced in the conversation.
When the caller passes file attachments (PDFs, images, spreadsheets), CoreCube renders markers into the prompt that point at each one. This fragment tells the LLM what those markers mean and how to use them.
Authorization is enforced by the server, not by the prompt — even if the prompt says "always read the file", the server will refuse markers the caller does not have permission to access. The prompt only governs behavior when access is granted.
Example skeleton:
When the user attaches files, treat each as primary evidence equal in
weight to retrieved context. Refer to attachments by filename. If an
attachment is referenced but not visible (marker present, content
missing), say so explicitly rather than guessing.
Limit: 5000 characters.
Retrieval assistants
These fragments are available to query tools. They turn a raw user message into one or more well-shaped queries that the dense and sparse search paths can work with effectively.
Semantic query
The prompt that rewrites a user message into a single standalone semantic query for dense (vector) search.
User messages are often conversational, references-laden, or assume context from earlier turns ("what about staging?", "and why did we drop it?"). Embedding a message like that produces a useless vector. The semantic query rewrite step asks an LLM to produce a standalone, self-contained question that captures what the user is actually asking.
Expected output: exactly one line — a single standalone query.
Example skeleton:
Rewrite the user's latest message into a single standalone search query
suitable for semantic retrieval. Resolve all pronouns and contextual
references using the conversation history. Output only the query text,
no quotes, no commentary, no explanation.
Limit: 4000 characters.
Keyword query
The prompt that extracts keyword queries from a user message for sparse (full-text) search.
Vector search is paraphrase-tolerant; full-text search is not. To get useful FTS hits, you need the identifiers, codes, proper nouns, and rare terms from the user's message — not a rewritten sentence.
Expected output: one to three lines, each a keyword cluster suitable for websearch_to_tsquery.
Example skeleton:
Extract one to three keyword search queries from the user's latest message,
each on its own line. Each line should contain only the rare and specific
terms that would distinguish a relevant document from an irrelevant one —
proper nouns, identifiers, error codes, file paths, technical jargon.
Drop common words, articles, and conversational fluff.
Limit: 4000 characters.
Selection
The prompt that asks an LLM to score the retrieved candidate chunks before they reach the answer step.
After retrieval and reranking, a chunk tool can ask an LLM to look at each candidate and decide whether to keep it, expand it with neighboring chunks, or discard it. This is a precision filter — it removes high-similarity-but-actually-irrelevant chunks before they distract the answer LLM.
Expected output: a JSON array, one entry per candidate, each with one of NOT_RELEVANT, KEEP, EXPAND_ADJACENT, or EXPAND_FULL.
Example skeleton:
For each candidate chunk below, judge whether it is relevant to the user's
question. Output a JSON array with one entry per candidate, in order, using
exactly one of these labels:
NOT_RELEVANT — discard, the chunk does not bear on the question
KEEP — include the chunk as-is
EXPAND_ADJACENT — include this chunk plus its immediate neighbors
EXPAND_FULL — include the entire source document this chunk is from
Output only the JSON array — no commentary, no markdown fences.
Limit: 6000 characters.
Ingestion
Prompts that run during document ingestion, not at query time.
OCR extraction
The instruction sent alongside image pages to chat-API OCR providers for text extraction.
When CoreCube ingests a scanned PDF or image-based document, it sends each page to a vision-capable LLM with this prompt as the extraction instruction. The model returns a transcription, which CoreCube then chunks, embeds, and indexes.
Applies to chat-API OCR providers only (e.g. GPT-4o vision, Claude vision). The dedicated Mistral OCR provider uses a structured /v1/ocr endpoint and ignores this prompt entirely — its extraction behavior is fixed by the API contract.
What to put in your version: what to do with tables, headings, math, handwriting, multi-column layouts, page furniture (headers, footers, page numbers).
Example skeleton:
Extract the text content of this page exactly as it appears, preserving
heading hierarchy as markdown (#, ##, ###). Reproduce tables in markdown
table syntax. Drop page headers, footers, and page numbers. If the page
contains diagrams or charts, write a one-sentence description of what each
shows but do not invent data not visible in the image.
Limit: 2000 characters.
Fragment reference
| Fragment type | Seeded fragment slug(s) | Notes |
|---|---|---|
answer | default-answer | Base answer behavior. Server appends tool prompts, context, and invariants. |
answer_citations | default-citations | Citation format. Server appends the source list. |
answer_attachments | default-attachments | File-marker behavior. Server enforces authorization regardless. |
retrieval_semantic_query | default-retrieval-semantic | Used by query tools. Expected output: one standalone query. |
retrieval_keyword_query | default-retrieval-keyword | Used by query tools. Expected output: 1–3 keyword lines. |
chunk_selection | default-chunk-selection | Used by chunk tools. Expected output depends on the tool; seeded selection uses JSON labels. |
ocr_extraction | default-ocr-extraction | Chat-API OCR providers only. Mistral OCR ignores this prompt. |
tool_prompt | tool-prompt-* fragments | Attached to tools and appended to the answer prompt when the tool is active. |
Related
- How retrieval works — the end-to-end flow that these prompts shape at each stage.
- Pipeline, tools, and prompts — how prompt fragments and tools are assembled into a runtime pipeline.
- Retrieval and Reranking — the candidate-stage knobs that pair with the retrieval-assistant prompts here.