Analytics (ClickHouse)
The Data Layer exposes ClickHouse as a first-class analytics backend
via kernel.data().analytics(). All module data is continuously
replicated from PostgreSQL to ClickHouse via the Debezium CDC pipeline.
ClickHouse delivers sub-second aggregation over millions of rows using
columnar storage and the MergeTree family of table engines.
Modules never connect to ClickHouse directly. All analytics calls
go through the Data Layer, which enforces tenant isolation, appends
FINAL for correctness, and returns lastSyncAt for consistency
transparency.
When to Use Analytics vs CRUD
| Scenario | API to use |
|---|---|
| Retrieve a single record by ID | data.retrieve() → PostgreSQL |
| List records with filters and pagination | data.list() → PostgreSQL |
| Count, sum, average, group by | data.analytics() → ClickHouse |
| Time-series trends, histogram, funnel | data.analytics() → ClickHouse |
| Full-table scan over > 1 M rows | data.analytics() → ClickHouse |
| Cross-model aggregation (reporting) | data.analytics() → ClickHouse |
PostgreSQL (retrieve, list) is the authoritative source. ClickHouse
is the analytics replica — always consistent within < 5 seconds.
Eventual Consistency
The CDC pipeline (PostgreSQL → Debezium → Kafka → ClickHouse) has a propagation latency of < 5 seconds under normal load.
Every analytics response includes meta.lastSyncAt — the ISO 8601
timestamp of the most recent CDC sync for the queried table. UIs can
display a "Data as of HH:MM:SS" badge when lastSyncAt is more
than a few seconds old.
{
"data": [ ... ],
"meta": {
"lastSyncAt": "2026-04-15T10:29:55Z",
"rowsScanned": 2400000,
"durationMs": 42
}
}
Aggregation Operations
| Operation | aggregate value | SQL equivalent |
|---|---|---|
| Count records | count | COUNT(*) |
| Sum a numeric field | sum | SUM(field) |
| Average a numeric field | avg | AVG(field) |
| Minimum value | min | MIN(field) |
| Maximum value | max | MAX(field) |
SDK
Count
import { kernel } from '@platform/sdk-core';
const result = await kernel.data().analytics({
model: 'contacts',
aggregate: 'count',
});
// result.data: [{ count: 48200 }]
// result.meta.lastSyncAt: '2026-04-15T10:29:55Z'
Sum with filter
const revenue = await kernel.data().analytics({
model: 'deals',
aggregate: 'sum',
field: 'amount',
filters: { status: 'eq.won' },
});
// { data: [{ sum: 1250000 }], meta: { lastSyncAt: '...' } }
Group by field
const byStatus = await kernel.data().analytics({
model: 'contacts',
aggregate: 'count',
groupBy: 'status',
});
// {
// data: [
// { status: 'active', count: 38000 },
// { status: 'inactive', count: 10200 }
// ],
// meta: { lastSyncAt: '...' }
// }
Time-series (group by date truncation)
const dailySignups = await kernel.data().analytics({
model: 'contacts',
aggregate: 'count',
groupBy: 'created_at',
groupByInterval: 'day',
filters: {
created_at: 'gte.2026-01-01T00:00:00Z',
},
order: 'created_at.asc',
});
// {
// data: [
// { created_at: '2026-01-01', count: 143 },
// { created_at: '2026-01-02', count: 221 },
// ...
// ]
// }
Multi-metric aggregation
const summary = await kernel.data().analytics({
model: 'deals',
metrics: [
{ aggregate: 'count' },
{ aggregate: 'sum', field: 'amount' },
{ aggregate: 'avg', field: 'amount' },
],
filters: { stage: 'eq.closed_won' },
});
// {
// data: [{ count: 412, sum: 8340000, avg: 20242 }],
// meta: { lastSyncAt: '...' }
// }
REST Endpoint
POST https://api.septemcore.com/v1/data/crm/analytics
Authorization: Bearer <access_token>
Content-Type: application/json
{
"model": "contacts",
"aggregate": "count",
"groupBy": "status",
"filters": {
"deleted_at": "is.null"
}
}
Response 200 OK:
{
"data": [
{ "status": "active", "count": 38000 },
{ "status": "inactive", "count": 10200 }
],
"meta": {
"lastSyncAt": "2026-04-15T10:29:55Z",
"rowsScanned": 48200,
"durationMs": 17
}
}
Tenant Isolation (Dual-Layer)
ClickHouse does not have a native RLS system comparable to PostgreSQL. The Data Layer enforces tenant isolation through two independent mechanisms:
| Layer | Mechanism | When it fires |
|---|---|---|
| Application | Go Data Layer appends WHERE tenant_id = $tenantId to every query | Always — tenant_id from JWT, never from request params |
| Database | ClickHouse CREATE ROW POLICY tenant_filter USING tenant_id = {tenantId} | DB-level fence — fires even if Go code has a bug |
| Audit | All ClickHouse queries logged with tenantId + traceId | Forensic analysis, SOC2 compliance |
Row policies are created automatically when a new tenant is provisioned.
ReplacingMergeTree and FINAL
Analytics tables use the ReplacingMergeTree engine. CDC delivers
at-least-once, so duplicate rows can accumulate between background
merges. The Data Layer always appends the FINAL modifier to every
SELECT, forcing ClickHouse to deduplicate on read and return only
the latest version of each row:
-- Generated internally — modules never write this directly
SELECT status, count(*) AS count
FROM crm_contacts FINAL
WHERE tenant_id = '01j9p3kz5f00000000000000'
AND deleted_at IS NULL
GROUP BY status
FINAL guarantees correctness. The performance cost is acceptable
for analytics workloads (single query, not per-row).
Required Permissions
| Action | Permission |
|---|---|
| Any analytics query | {module}.{model}.read |
Analytics endpoints follow the same permission model as CRUD reads. No separate analytics permission is needed.
Error Reference
| Scenario | HTTP | type URI suffix |
|---|---|---|
| Unknown model | 404 | not-found |
| Unknown aggregate | 400 | validation-error |
Unknown field in groupBy | 400 | validation-error (code: UNKNOWN_FIELD) |
| Insufficient permission | 403 | forbidden |
| ClickHouse query timeout (> 30 s) | 504 | gateway-timeout |