Library — File Upload
The Library is CoreCube's manual ingestion path. Upload documents directly to make them immediately searchable alongside content from automated connectors.
Supported formats
| Format | Extension | Notes |
|---|---|---|
.pdf | Text extraction + OCR for scanned pages | |
| Markdown | .md, .mdx | Heading-aware chunking |
| Word | .docx | Heading-aware chunking |
| Excel | .xlsx | Table extraction |
| PowerPoint | .pptx | Slide text extraction |
| Plain text | .txt | Fixed-size chunking |
| HTML | .html, .htm | Converted to markdown before chunking |
| JSON | .json | Record-based chunking (each top-level key or array item) |
Maximum file size: 50 MB per file.
Uploading documents
- Navigate to Admin Console → Library
- Click Upload or drag and drop files into the upload area
- Select a compartment and sensitivity level
- Click Upload
Documents are processed asynchronously. The Library shows processing status: extracting → chunking → embedding → indexed.
Compartments and access
Library documents are treated identically to connector documents for access control purposes. A document uploaded to the hr/confidential compartment is only visible to users whose scope includes hr with at least confidential sensitivity.
:::info No re-upload needed Unlike tools that require per-client document upload, documents in CoreCube's Library are available to all connected AI interfaces — OpenWebUI, Claude Desktop, your API clients — without re-uploading. :::
OCR for scanned PDFs
PDFs with little or no extractable text are automatically detected and processed through OCR:
- PDF pages are rendered to PNG at high resolution via mupdf (WASM, no native dependency)
- Each page image is passed to the configured OCR model (local via CoreCube Inference, or a cloud vision API)
- Extracted text enters the standard chunking and embedding pipeline
Configure OCR in Admin Console → Settings → OCR Model.
To see OCR coverage across your Library:
- Navigate to Admin Console → Library → Process
- The PDF Coverage tab shows how many PDFs have extracted text vs. are still empty
- Click Re-scan to find and re-queue empty PDFs
Filtering and browsing
The Library view supports filtering by:
- Ingestion path — view all
libraryuploads, or filter toconnector/checkin - Compartment — filter by organizational boundary
- Status —
indexed,processing,failed - Upload date — sort or filter by when files were added
Replacing documents
Re-uploading a file with the same name to the same compartment replaces the previous version. The old chunks are removed and new chunks are generated from the updated content.
Check-ins vs Library
The Library is for structured documents (PDFs, Markdown files, DOCX). Use Check-ins for informal human knowledge — decisions, observations, context that doesn't exist in a file.
| Library | Check-in | |
|---|---|---|
| Input | File upload | Short text via any channel |
| Best for | Documents, reports, runbooks | Decisions, observations, context |
| Structured | Yes (file-based) | No (free-form text) |
| Channels | Admin Console | Web form, Slack, Teams, Email, API |