Skip to main content

Library — File Upload

The Library is CoreCube's manual ingestion path. Upload documents directly to make them immediately searchable alongside content from automated connectors.

Supported formats

FormatExtensionNotes
PDF.pdfText extraction + OCR for scanned pages
Markdown.md, .mdxHeading-aware chunking
Word.docxHeading-aware chunking
Excel.xlsxTable extraction
PowerPoint.pptxSlide text extraction
Plain text.txtFixed-size chunking
HTML.html, .htmConverted to markdown before chunking
JSON.jsonRecord-based chunking (each top-level key or array item)

Maximum file size: 50 MB per file.

Uploading documents

  1. Navigate to Admin Console → Library
  2. Click Upload or drag and drop files into the upload area
  3. Select a compartment and sensitivity level
  4. Click Upload

Documents are processed asynchronously. The Library shows processing status: extracting → chunking → embedding → indexed.

Compartments and access

Library documents are treated identically to connector documents for access control purposes. A document uploaded to the hr/confidential compartment is only visible to users whose scope includes hr with at least confidential sensitivity.

:::info No re-upload needed Unlike tools that require per-client document upload, documents in CoreCube's Library are available to all connected AI interfaces — OpenWebUI, Claude Desktop, your API clients — without re-uploading. :::

OCR for scanned PDFs

PDFs with little or no extractable text are automatically detected and processed through OCR:

  1. PDF pages are rendered to PNG at high resolution via mupdf (WASM, no native dependency)
  2. Each page image is passed to the configured OCR model (local via CoreCube Inference, or a cloud vision API)
  3. Extracted text enters the standard chunking and embedding pipeline

Configure OCR in Admin Console → Settings → OCR Model.

To see OCR coverage across your Library:

  • Navigate to Admin Console → Library → Process
  • The PDF Coverage tab shows how many PDFs have extracted text vs. are still empty
  • Click Re-scan to find and re-queue empty PDFs

Filtering and browsing

The Library view supports filtering by:

  • Ingestion path — view all library uploads, or filter to connector / checkin
  • Compartment — filter by organizational boundary
  • Statusindexed, processing, failed
  • Upload date — sort or filter by when files were added

Replacing documents

Re-uploading a file with the same name to the same compartment replaces the previous version. The old chunks are removed and new chunks are generated from the updated content.

Check-ins vs Library

The Library is for structured documents (PDFs, Markdown files, DOCX). Use Check-ins for informal human knowledge — decisions, observations, context that doesn't exist in a file.

LibraryCheck-in
InputFile uploadShort text via any channel
Best forDocuments, reports, runbooksDecisions, observations, context
StructuredYes (file-based)No (free-form text)
ChannelsAdmin ConsoleWeb form, Slack, Teams, Email, API

We use cookies for analytics to improve our website. More information in our Privacy Policy.