Confluence Connector
The Confluence connector ingests pages, comments, and attachments from Confluence Cloud and Confluence Data Center via the REST API.
What it ingests
- Pages — full page content including body text
- Comments — inline and page-level comments (configurable)
- Labels — page labels stored as metadata
- Page hierarchy — parent-child relationships preserved in heading path
- Attachments — PDFs, images, and other files (configurable, with OCR for images and scanned PDFs)
Authentication
Confluence Cloud
Use an API token:
- Go to id.atlassian.com/manage-profile/security/api-tokens
- Click Create API token
- Copy the token
In CoreCube:
- Server URL:
https://your-domain.atlassian.net - Email: your Atlassian account email
- API Token: the token from step 3
Confluence Data Center / Server
Use a Personal Access Token (Confluence 7.9+):
- Go to Profile → Personal Access Tokens → Create token
- Set an expiry and the required permissions (read access)
In CoreCube:
- Server URL:
https://confluence.your-company.com - Personal Access Token: the token from step 2
Source filtering
Create separate connections per compartment by filtering to specific space keys:
| Field | Description | Example |
|---|---|---|
| Space key(s) | Comma-separated list of Confluence space keys | ENG, DEVOPS, PLATFORM |
| Parent page ID | Only ingest pages under a specific parent (optional) | 123456 |
| Label filter | Only ingest pages with specific labels (optional) | documentation, runbook |
:::tip One connection per compartment
Create "Confluence — Engineering Docs" scoped to ENG, DEVOPS with compartment engineering, and "Confluence — HR Policies" scoped to HR with compartment hr. Each has its own sensitivity level and user scope.
:::
Sync options
| Option | Default | Description |
|---|---|---|
| Include comments | Yes | Ingest page-level and inline comments |
| Include labels | Yes | Store page labels as chunk metadata |
| Include attachments | No | Ingest attached PDF, DOCX, and image files |
| Sync schedule | Every 1 hour | How often to check for updated pages |
Change detection
CoreCube uses Confluence's page version number to detect changes. Only pages with a version number higher than the last-synced version are re-fetched. This makes incremental syncs very fast — typically only a handful of pages are re-processed on each run.
Deleted and trashed pages are detected and removed from the evidence layer.
Webhook support
Confluence webhooks trigger an immediate sync when pages are created, updated, or deleted — instead of waiting for the next scheduled sync.
To configure a webhook in Confluence:
- Go to Confluence Admin → System → Webhooks
- Click Create webhook
- Set the URL to:
https://corecube.your-domain.com/api/webhooks/{connection-id} - Copy the webhook secret from the CoreCube connection detail view and enter it as the Secret
- Select events:
page_created,page_updated,page_deleted,page_trashed
CoreCube validates the HMAC-SHA256 signature on every incoming webhook and rejects payloads older than 5 minutes.
Content normalization
Confluence pages are stored in Atlassian Document Format (ADF). CoreCube converts ADF to clean markdown before chunking:
- Headings → Markdown headings (preserved in heading_path for weighted FTS)
- Tables → Markdown tables
- Code blocks → Fenced code blocks with language tag
- Images → Placeholder text (or OCR'd content if attachments are enabled)
- Macros → Extracted text where possible, metadata where not
Troubleshooting
"Authentication failed"
- Verify your API token or Personal Access Token is still valid
- Ensure the account has at least read access to the configured spaces
- For Confluence Cloud, confirm the email matches your Atlassian account
"Space not found"
- Verify the space key is correct (case-sensitive)
- Confirm the authenticated user can access the space
Pages missing after sync
- Check whether the pages have any of the configured label filters
- Verify the pages are not in a restricted space that the user cannot access
- Check the Connection → Last sync log for skip or failure entries