Skip to main content

Connectors Overview

Connectors are CoreCube's automated ingestion path. They sync documents from external systems into the evidence layer through scheduled runs and webhooks.

The three ingestion paths

All knowledge enters CoreCube through one of three paths:

PathDescriptionSource tag
ConnectorsAutomated delta sync from external systemsconnector
LibraryManual document uploads (PDF, Markdown, DOCX, etc.)library
Check-insHuman knowledge via Slack, Teams, email, web form, or APIcheckin

All three converge into the same evidence layer with identical retrieval treatment. The source_path tag enables trust-aware ranking and scope filtering.

Connector tiers

First-class connectors

Purpose-built integrations with reliable incremental sync, deletion detection, and content normalization.

ConnectorChange detectionWebhooks
ConfluencePage version numberYes
JiraIssue updated timestampYes
NotionLast-edited timestampYes
GitHubCommit SHAYes
GitLabCommit SHAYes
BitbucketCommit hashYes
Google DriveFile revision + modifiedTimeYes
Microsoft 365Delta query APIYes
SlackMessage timestampYes
HubSpotUpdated timestampYes
NextcloudETag + getlastmodifiedYes (NC 28+)
OneTimePIMUpdated timestampNo
abas ERPModified timestampNo
Business CentrallastModifiedDateTimeYes
inFlowUpdated timestampNo

All first-class connectors guarantee:

  • Stable external document IDs
  • Incremental sync (only fetch changed documents)
  • Deletion detection (tombstone documents when upstream deletes)
  • Content normalization (HTML, ADF, PDF, DOCX → clean markdown)
  • Rate limit handling with configurable backoff

Power-user connectors

Configurable adapters for sources without a dedicated connector. Labeled Advanced in the admin UI.

ConnectorUse caseLimitations
Generic RESTAny JSON APINo automatic change detection — full re-fetch on every sync
Generic DatabasePostgreSQL, MySQL (read-only)Full query on every sync
File StoreLocal directory, S3, WebDAVChange detection via file modification time

Protocol connector

ConnectorUse case
MCP ClientConnect to any MCP-compatible server as a data source

MCP ingestion quality depends entirely on the upstream server. Labeled Experimental.

Compartments and access control

Every connection belongs to exactly one compartment and has a sensitivity level. Together they form the connection's security label (e.g., hr/confidential).

Compartments

Admin-defined organizational boundaries — teams, departments, or functional areas:

executive rnd hr finance engineering legal all-staff

A connection's compartment cannot be changed after creation. To reclassify, delete and recreate the connection.

Sensitivity levels

LevelDescriptionExample
publicSafe for anyone in the organizationPublic docs, marketing
internalGeneral internal, not externally shareableTeam wikis, project docs
confidentialSensitive, restricted accessFinancial reports, HR records
restrictedHighly sensitive, need-to-knowM&A docs, legal matters

Source filtering

First-class connectors support filtering at connection setup so you can create narrow, compartment-appropriate connections instead of one broad connection:

✓ "Confluence — Engineering Docs" compartment: engineering sensitivity: internal
Space keys: ENG, DEVOPS

✓ "Confluence — HR Policies" compartment: hr sensitivity: confidential
Space keys: HR

✓ "Confluence — Company Handbook" compartment: all-staff sensitivity: public
Space keys: HANDBOOK

The connection form surfaces these filters prominently with a guidance message: "Select which parts of this source to ingest. Create separate connections for content with different sensitivity levels."

Source trust levels

Trust levelDescriptionExample sources
AuthoritativeOfficial, maintained documentationRunbooks, approved policies, official docs
ReferenceUseful context that may be informalWiki pages, shared notes, meeting summaries
VolatileRapidly changing or unverifiedChat exports, draft documents, ticket comments

Default: reference. Trust level is used as a tie-breaker in search ranking when chunk scores are equal.

Sync schedule

Each connection has a configurable sync interval (e.g., every 15 minutes, every hour, daily). Webhooks can trigger immediate re-sync when supported by the source.

Manual sync: Click Sync Now in the connection detail view to trigger an immediate sync.

Connection health

The Admin Console shows real-time connection health:

StatusMeaning
HealthyLast sync completed within the expected window
DegradedLast sync had partial failures or is overdue
OfflineCannot reach the source or authentication failed

Connector metrics

Per-connection metrics available in the connection detail view:

MetricDescription
documents_scannedTotal documents found during last scan
documents_processedDocuments through the ingestion pipeline
documents_skippedUnchanged documents not re-processed
documents_failedDocuments that failed extraction
rate_limit_hitsRate limit responses from source API
sync_duration_msTotal sync run duration
last_errorMost recent error message and timestamp

Resource limits

ResourceLimitBehavior when exceeded
Document extraction time60 secondsDocument marked as failed, sync continues
Document content size50 MB rawDocument skipped with warning
Concurrent syncs (global)3Additional syncs queued
Sync run duration4 hoursTerminated, partial results kept, logged as partial_ok

We use cookies for analytics to improve our website. More information in our Privacy Policy.