Connectors Overview

Connectors are CoreCube's automated ingestion path. They sync documents from external systems into the evidence layer through scheduled runs and webhooks.

The three ingestion paths

All knowledge enters CoreCube through one of three paths:

Path	Description	Source tag
Connectors	Automated delta sync from external systems	`connector`
Library	Manual document uploads (PDF, Markdown, DOCX, etc.)	`library`
Check-ins	Human knowledge via Slack, Teams, email, web form, or API	`checkin`

All three converge into the same evidence layer with identical retrieval treatment. The source_path tag enables trust-aware ranking and scope filtering.

Connector tiers

First-class connectors

Purpose-built integrations with reliable incremental sync, deletion detection, and content normalization.

Connector	Change detection	Webhooks
Confluence	Page version number	Yes
Jira	Issue updated timestamp	Yes
Notion	Last-edited timestamp	Yes
GitHub	Commit SHA	Yes
GitLab	Commit SHA	Yes
Bitbucket	Commit hash	Yes
Google Drive	File revision + modifiedTime	Yes
Microsoft 365	Delta query API	Yes
Slack	Message timestamp	Yes
HubSpot	Updated timestamp	Yes
Nextcloud	ETag + getlastmodified	Yes (NC 28+)
OneTimePIM	Updated timestamp	No
abas ERP	Modified timestamp	No
Business Central	lastModifiedDateTime	Yes
inFlow	Updated timestamp	No

All first-class connectors guarantee:

Stable external document IDs
Incremental sync (only fetch changed documents)
Deletion detection (tombstone documents when upstream deletes)
Content normalization (HTML, ADF, PDF, DOCX → clean markdown)
Rate limit handling with configurable backoff

Power-user connectors

Configurable adapters for sources without a dedicated connector. Labeled Advanced in the admin UI.

Connector	Use case	Limitations
Generic REST	Any JSON API	No automatic change detection — full re-fetch on every sync
Generic Database	PostgreSQL, MySQL (read-only)	Full query on every sync
File Store	Local directory, S3, WebDAV	Change detection via file modification time

Protocol connector

Connector	Use case
MCP Client	Connect to any MCP-compatible server as a data source

MCP ingestion quality depends entirely on the upstream server. Labeled Experimental.

Compartments and access control

Every connection belongs to exactly one compartment and has a sensitivity level. Together they form the connection's security label (e.g., hr/confidential).

Compartments

Admin-defined organizational boundaries — teams, departments, or functional areas:

executive    rnd    hr    finance    engineering    legal    all-staff

A connection's compartment cannot be changed after creation. To reclassify, delete and recreate the connection.

Sensitivity levels

Level	Description	Example
`public`	Safe for anyone in the organization	Public docs, marketing
`internal`	General internal, not externally shareable	Team wikis, project docs
`confidential`	Sensitive, restricted access	Financial reports, HR records
`restricted`	Highly sensitive, need-to-know	M&A docs, legal matters

Source filtering

First-class connectors support filtering at connection setup so you can create narrow, compartment-appropriate connections instead of one broad connection:

✓ "Confluence — Engineering Docs"   compartment: engineering  sensitivity: internal
  Space keys: ENG, DEVOPS

✓ "Confluence — HR Policies"        compartment: hr           sensitivity: confidential
  Space keys: HR

✓ "Confluence — Company Handbook"   compartment: all-staff    sensitivity: public
  Space keys: HANDBOOK

The connection form surfaces these filters prominently with a guidance message: "Select which parts of this source to ingest. Create separate connections for content with different sensitivity levels."

Source trust levels

Trust level	Description	Example sources
Authoritative	Official, maintained documentation	Runbooks, approved policies, official docs
Reference	Useful context that may be informal	Wiki pages, shared notes, meeting summaries
Volatile	Rapidly changing or unverified	Chat exports, draft documents, ticket comments

Default: reference. Trust level is used as a tie-breaker in search ranking when chunk scores are equal.

Sync schedule

Each connection has a configurable sync interval (e.g., every 15 minutes, every hour, daily). Webhooks can trigger immediate re-sync when supported by the source.

Manual sync: Click Sync Now in the connection detail view to trigger an immediate sync.

Connection health

The Admin Console shows real-time connection health:

Status	Meaning
Healthy	Last sync completed within the expected window
Degraded	Last sync had partial failures or is overdue
Offline	Cannot reach the source or authentication failed

Connector metrics

Per-connection metrics available in the connection detail view:

Metric	Description
`documents_scanned`	Total documents found during last scan
`documents_processed`	Documents through the ingestion pipeline
`documents_skipped`	Unchanged documents not re-processed
`documents_failed`	Documents that failed extraction
`rate_limit_hits`	Rate limit responses from source API
`sync_duration_ms`	Total sync run duration
`last_error`	Most recent error message and timestamp

Resource limits

Resource	Limit	Behavior when exceeded
Document extraction time	60 seconds	Document marked as `failed`, sync continues
Document content size	50 MB raw	Document skipped with warning
Concurrent syncs (global)	3	Additional syncs queued
Sync run duration	4 hours	Terminated, partial results kept, logged as `partial_ok`

The three ingestion paths​

Connector tiers​

First-class connectors​

Power-user connectors​

Protocol connector​

Compartments and access control​

Compartments​

Sensitivity levels​

Source filtering​

Source trust levels​

Sync schedule​

Connection health​

Connector metrics​

Resource limits​