Library — File Upload

The Library is CoreCube's manual ingestion path. Upload documents directly to make them immediately searchable alongside content from automated connectors.

Supported formats

Format	Extension	Notes
PDF	`.pdf`	Text extraction + OCR for scanned pages
Markdown	`.md`, `.mdx`	Heading-aware chunking
Word	`.docx`	Heading-aware chunking
Excel	`.xlsx`	Table extraction
PowerPoint	`.pptx`	Slide text extraction
Plain text	`.txt`	Fixed-size chunking
HTML	`.html`, `.htm`	Converted to markdown before chunking
JSON	`.json`	Record-based chunking (each top-level key or array item)

Maximum file size: 50 MB per file.

Uploading documents

Navigate to Admin Console → Library
Click Upload or drag and drop files into the upload area
Select a compartment and sensitivity level
Click Upload

Documents are processed asynchronously. The Library shows processing status: extracting → chunking → embedding → indexed.

Compartments and access

Library documents are treated identically to connector documents for access control purposes. A document uploaded to the hr/confidential compartment is only visible to users whose scope includes hr with at least confidential sensitivity.

:::info No re-upload needed Unlike tools that require per-client document upload, documents in CoreCube's Library are available to all connected AI interfaces — OpenWebUI, Claude Desktop, your API clients — without re-uploading. :::

OCR for scanned PDFs

PDFs with little or no extractable text are automatically detected and processed through OCR:

PDF pages are rendered to PNG at high resolution via mupdf (WASM, no native dependency)
Each page image is passed to the configured OCR model (local via CoreCube Inference, or a cloud vision API)
Extracted text enters the standard chunking and embedding pipeline

Configure OCR in Admin Console → Settings → OCR Model.

To see OCR coverage across your Library:

Navigate to Admin Console → Library → Process
The PDF Coverage tab shows how many PDFs have extracted text vs. are still empty
Click Re-scan to find and re-queue empty PDFs

Filtering and browsing

The Library view supports filtering by:

Ingestion path — view all library uploads, or filter to connector / checkin
Compartment — filter by organizational boundary
Status — indexed, processing, failed
Upload date — sort or filter by when files were added

Replacing documents

Re-uploading a file with the same name to the same compartment replaces the previous version. The old chunks are removed and new chunks are generated from the updated content.

Check-ins vs Library

The Library is for structured documents (PDFs, Markdown files, DOCX). Use Check-ins for informal human knowledge — decisions, observations, context that doesn't exist in a file.

	Library	Check-in
Input	File upload	Short text via any channel
Best for	Documents, reports, runbooks	Decisions, observations, context
Structured	Yes (file-based)	No (free-form text)
Channels	Admin Console	Web form, Slack, Teams, Email, API

Supported formats​

Uploading documents​

Compartments and access​

OCR for scanned PDFs​

Filtering and browsing​

Replacing documents​

Check-ins vs Library​