Confluence Connector
The Confluence connector ingests pages, comments, and labels from Confluence Cloud and Confluence Data Center via the REST API.
What it ingests
- Pages — full page content including body text
- Comments — inline and page-level comments (configurable)
- Labels — page labels stored as metadata
- Page hierarchy — parent-child relationships preserved in heading path
Authentication
Confluence Cloud
Use an API token:
- Go to id.atlassian.com/manage-profile/security/api-tokens
- Click Create API token
- Copy the token
In CoreCube:
- Server URL:
https://your-domain.atlassian.net - Email: your Atlassian account email
- API Token: the token from step 3
Confluence Data Center / Server
CoreCube authenticates to every Confluence endpoint with HTTP Basic auth (username:token, base64-encoded). The Bearer / Personal Access Token scheme is not used — even on Data Center, supply a username together with a token or password.
- Use your Confluence account username
- Create an API token, or use your account password, with at least read access to the target spaces
In CoreCube:
- Server URL:
https://confluence.your-company.com - Email / Username: your Confluence username
- API Token: the token (or password) from step 2
Source filtering
Create separate connections per compartment by filtering to specific space keys:
| Field | Description | Example |
|---|---|---|
| Space key(s) | Comma-separated list of Confluence space keys | ENG, DEVOPS, PLATFORM |
Create "Confluence — Engineering Docs" scoped to ENG, DEVOPS with compartment engineering, and "Confluence — HR Policies" scoped to HR with compartment hr. Each has its own sensitivity level and user scope.
Sync options
| Option | Default | Description |
|---|---|---|
| Include comments | Yes | Ingest page-level and inline comments |
| Include labels | Yes | Store page labels as chunk metadata |
| Sync schedule | Manual | Manual by default; optional interval of 15, 30, 60, or 360 minutes |
Change detection
CoreCube uses Confluence's page version number to detect changes. Only pages with a version number higher than the last-synced version are re-fetched. This makes incremental syncs very fast — typically only a handful of pages are re-processed on each run.
Deleted and trashed pages are detected and removed from the evidence layer.
Content normalization
Confluence pages are delivered in storage format — Confluence's XHTML-based representation. CoreCube converts this HTML to clean text before chunking:
- Headings → Markdown headings (preserved in heading_path for weighted FTS)
- Tables → Markdown tables
- Code blocks → Fenced code blocks
- Images → Alt text kept as a placeholder (e.g.
[Image: architecture diagram]); images without alt text are dropped - Macros → Inner text extracted where present
Troubleshooting
"Authentication failed"
- Verify your API token (or account password) is still valid
- Ensure the account has at least read access to the configured spaces
- For Confluence Cloud, confirm the email matches your Atlassian account
"Space not found"
- Verify the space key is correct (case-sensitive)
- Confirm the authenticated user can access the space
Pages missing after sync
- Verify the pages are not in a restricted space that the user cannot access
- Confirm the pages belong to one of the configured space keys
- Check the Connection → Last sync log for skip or failure entries