Connectors Overview
Connectors are CoreCube's automated ingestion path. They sync documents from external systems into the evidence layer through scheduled runs and webhooks.
The three ingestion paths
All knowledge enters CoreCube through one of three paths:
| Path | Description | Source tag |
|---|---|---|
| Connectors | Automated delta sync from external systems | connector |
| Library | Manual document uploads (PDF, Markdown, DOCX, etc.) | library |
| Check-ins | Human knowledge via Slack, Teams, email, web form, or API | checkin |
All three converge into the same evidence layer with identical retrieval treatment. The source_path tag enables trust-aware ranking and scope filtering.
Connector tiers
First-class connectors
Purpose-built integrations with reliable incremental sync, deletion detection, and content normalization.
| Connector | Change detection | Webhooks |
|---|---|---|
| Confluence | Page version number | Yes |
| Jira | Issue updated timestamp | Yes |
| Notion | Last-edited timestamp | Yes |
| GitHub | Commit SHA | Yes |
| GitLab | Commit SHA | Yes |
| Bitbucket | Commit hash | Yes |
| Google Drive | File revision + modifiedTime | Yes |
| Microsoft 365 | Delta query API | Yes |
| Slack | Message timestamp | Yes |
| HubSpot | Updated timestamp | Yes |
| Nextcloud | ETag + getlastmodified | Yes (NC 28+) |
| OneTimePIM | Updated timestamp | No |
| abas ERP | Modified timestamp | No |
| Business Central | lastModifiedDateTime | Yes |
| inFlow | Updated timestamp | No |
All first-class connectors guarantee:
- Stable external document IDs
- Incremental sync (only fetch changed documents)
- Deletion detection (tombstone documents when upstream deletes)
- Content normalization (HTML, ADF, PDF, DOCX → clean markdown)
- Rate limit handling with configurable backoff
Power-user connectors
Configurable adapters for sources without a dedicated connector. Labeled Advanced in the admin UI.
| Connector | Use case | Limitations |
|---|---|---|
| Generic REST | Any JSON API | No automatic change detection — full re-fetch on every sync |
| Generic Database | PostgreSQL, MySQL (read-only) | Full query on every sync |
| File Store | Local directory, S3, WebDAV | Change detection via file modification time |
Protocol connector
| Connector | Use case |
|---|---|
| MCP Client | Connect to any MCP-compatible server as a data source |
MCP ingestion quality depends entirely on the upstream server. Labeled Experimental.
Compartments and access control
Every connection belongs to exactly one compartment and has a sensitivity level. Together they form the connection's security label (e.g., hr/confidential).
Compartments
Admin-defined organizational boundaries — teams, departments, or functional areas:
executive rnd hr finance engineering legal all-staff
A connection's compartment cannot be changed after creation. To reclassify, delete and recreate the connection.
Sensitivity levels
| Level | Description | Example |
|---|---|---|
public | Safe for anyone in the organization | Public docs, marketing |
internal | General internal, not externally shareable | Team wikis, project docs |
confidential | Sensitive, restricted access | Financial reports, HR records |
restricted | Highly sensitive, need-to-know | M&A docs, legal matters |
Source filtering
First-class connectors support filtering at connection setup so you can create narrow, compartment-appropriate connections instead of one broad connection:
✓ "Confluence — Engineering Docs" compartment: engineering sensitivity: internal
Space keys: ENG, DEVOPS
✓ "Confluence — HR Policies" compartment: hr sensitivity: confidential
Space keys: HR
✓ "Confluence — Company Handbook" compartment: all-staff sensitivity: public
Space keys: HANDBOOK
The connection form surfaces these filters prominently with a guidance message: "Select which parts of this source to ingest. Create separate connections for content with different sensitivity levels."
Source trust levels
| Trust level | Description | Example sources |
|---|---|---|
| Authoritative | Official, maintained documentation | Runbooks, approved policies, official docs |
| Reference | Useful context that may be informal | Wiki pages, shared notes, meeting summaries |
| Volatile | Rapidly changing or unverified | Chat exports, draft documents, ticket comments |
Default: reference. Trust level is used as a tie-breaker in search ranking when chunk scores are equal.
Sync schedule
Each connection has a configurable sync interval (e.g., every 15 minutes, every hour, daily). Webhooks can trigger immediate re-sync when supported by the source.
Manual sync: Click Sync Now in the connection detail view to trigger an immediate sync.
Connection health
The Admin Console shows real-time connection health:
| Status | Meaning |
|---|---|
| Healthy | Last sync completed within the expected window |
| Degraded | Last sync had partial failures or is overdue |
| Offline | Cannot reach the source or authentication failed |
Connector metrics
Per-connection metrics available in the connection detail view:
| Metric | Description |
|---|---|
documents_scanned | Total documents found during last scan |
documents_processed | Documents through the ingestion pipeline |
documents_skipped | Unchanged documents not re-processed |
documents_failed | Documents that failed extraction |
rate_limit_hits | Rate limit responses from source API |
sync_duration_ms | Total sync run duration |
last_error | Most recent error message and timestamp |
Resource limits
| Resource | Limit | Behavior when exceeded |
|---|---|---|
| Document extraction time | 60 seconds | Document marked as failed, sync continues |
| Document content size | 50 MB raw | Document skipped with warning |
| Concurrent syncs (global) | 3 | Additional syncs queued |
| Sync run duration | 4 hours | Terminated, partial results kept, logged as partial_ok |