Skip to main content

Analytics (ClickHouse)

The Data Layer exposes ClickHouse as a first-class analytics backend via kernel.data().analytics(). All module data is continuously replicated from PostgreSQL to ClickHouse via the Debezium CDC pipeline. ClickHouse delivers sub-second aggregation over millions of rows using columnar storage and the MergeTree family of table engines.

Modules never connect to ClickHouse directly. All analytics calls go through the Data Layer, which enforces tenant isolation, appends FINAL for correctness, and returns lastSyncAt for consistency transparency.


When to Use Analytics vs CRUD

ScenarioAPI to use
Retrieve a single record by IDdata.retrieve() → PostgreSQL
List records with filters and paginationdata.list() → PostgreSQL
Count, sum, average, group bydata.analytics() → ClickHouse
Time-series trends, histogram, funneldata.analytics() → ClickHouse
Full-table scan over > 1 M rowsdata.analytics() → ClickHouse
Cross-model aggregation (reporting)data.analytics() → ClickHouse

PostgreSQL (retrieve, list) is the authoritative source. ClickHouse is the analytics replica — always consistent within < 5 seconds.


Eventual Consistency

The CDC pipeline (PostgreSQL → Debezium → Kafka → ClickHouse) has a propagation latency of < 5 seconds under normal load.

Every analytics response includes meta.lastSyncAt — the ISO 8601 timestamp of the most recent CDC sync for the queried table. UIs can display a "Data as of HH:MM:SS" badge when lastSyncAt is more than a few seconds old.

{
"data": [ ... ],
"meta": {
"lastSyncAt": "2026-04-15T10:29:55Z",
"rowsScanned": 2400000,
"durationMs": 42
}
}

Aggregation Operations

Operationaggregate valueSQL equivalent
Count recordscountCOUNT(*)
Sum a numeric fieldsumSUM(field)
Average a numeric fieldavgAVG(field)
Minimum valueminMIN(field)
Maximum valuemaxMAX(field)

SDK

Count

import { kernel } from '@platform/sdk-core';

const result = await kernel.data().analytics({
model: 'contacts',
aggregate: 'count',
});

// result.data: [{ count: 48200 }]
// result.meta.lastSyncAt: '2026-04-15T10:29:55Z'

Sum with filter

const revenue = await kernel.data().analytics({
model: 'deals',
aggregate: 'sum',
field: 'amount',
filters: { status: 'eq.won' },
});
// { data: [{ sum: 1250000 }], meta: { lastSyncAt: '...' } }

Group by field

const byStatus = await kernel.data().analytics({
model: 'contacts',
aggregate: 'count',
groupBy: 'status',
});
// {
// data: [
// { status: 'active', count: 38000 },
// { status: 'inactive', count: 10200 }
// ],
// meta: { lastSyncAt: '...' }
// }

Time-series (group by date truncation)

const dailySignups = await kernel.data().analytics({
model: 'contacts',
aggregate: 'count',
groupBy: 'created_at',
groupByInterval: 'day',
filters: {
created_at: 'gte.2026-01-01T00:00:00Z',
},
order: 'created_at.asc',
});
// {
// data: [
// { created_at: '2026-01-01', count: 143 },
// { created_at: '2026-01-02', count: 221 },
// ...
// ]
// }

Multi-metric aggregation

const summary = await kernel.data().analytics({
model: 'deals',
metrics: [
{ aggregate: 'count' },
{ aggregate: 'sum', field: 'amount' },
{ aggregate: 'avg', field: 'amount' },
],
filters: { stage: 'eq.closed_won' },
});
// {
// data: [{ count: 412, sum: 8340000, avg: 20242 }],
// meta: { lastSyncAt: '...' }
// }

REST Endpoint

POST https://api.septemcore.com/v1/data/crm/analytics
Authorization: Bearer <access_token>
Content-Type: application/json

{
"model": "contacts",
"aggregate": "count",
"groupBy": "status",
"filters": {
"deleted_at": "is.null"
}
}

Response 200 OK:

{
"data": [
{ "status": "active", "count": 38000 },
{ "status": "inactive", "count": 10200 }
],
"meta": {
"lastSyncAt": "2026-04-15T10:29:55Z",
"rowsScanned": 48200,
"durationMs": 17
}
}

Tenant Isolation (Dual-Layer)

ClickHouse does not have a native RLS system comparable to PostgreSQL. The Data Layer enforces tenant isolation through two independent mechanisms:

LayerMechanismWhen it fires
ApplicationGo Data Layer appends WHERE tenant_id = $tenantId to every queryAlways — tenant_id from JWT, never from request params
DatabaseClickHouse CREATE ROW POLICY tenant_filter USING tenant_id = {tenantId}DB-level fence — fires even if Go code has a bug
AuditAll ClickHouse queries logged with tenantId + traceIdForensic analysis, SOC2 compliance

Row policies are created automatically when a new tenant is provisioned.


ReplacingMergeTree and FINAL

Analytics tables use the ReplacingMergeTree engine. CDC delivers at-least-once, so duplicate rows can accumulate between background merges. The Data Layer always appends the FINAL modifier to every SELECT, forcing ClickHouse to deduplicate on read and return only the latest version of each row:

-- Generated internally — modules never write this directly
SELECT status, count(*) AS count
FROM crm_contacts FINAL
WHERE tenant_id = '01j9p3kz5f00000000000000'
AND deleted_at IS NULL
GROUP BY status

FINAL guarantees correctness. The performance cost is acceptable for analytics workloads (single query, not per-row).


Required Permissions

ActionPermission
Any analytics query{module}.{model}.read

Analytics endpoints follow the same permission model as CRUD reads. No separate analytics permission is needed.


Error Reference

ScenarioHTTPtype URI suffix
Unknown model404not-found
Unknown aggregate400validation-error
Unknown field in groupBy400validation-error (code: UNKNOWN_FIELD)
Insufficient permission403forbidden
ClickHouse query timeout (> 30 s)504gateway-timeout