Analytics (ClickHouse)

The Data Layer exposes ClickHouse as a first-class analytics backend via kernel.data().analytics(). All module data is continuously replicated from PostgreSQL to ClickHouse via the Debezium CDC pipeline. ClickHouse delivers sub-second aggregation over millions of rows using columnar storage and the MergeTree family of table engines.

Modules never connect to ClickHouse directly. All analytics calls go through the Data Layer, which enforces tenant isolation, appends FINAL for correctness, and returns lastSyncAt for consistency transparency.

When to Use Analytics vs CRUD

Scenario	API to use
Retrieve a single record by ID	`data.retrieve()` → PostgreSQL
List records with filters and pagination	`data.list()` → PostgreSQL
Count, sum, average, group by	`data.analytics()` → ClickHouse
Time-series trends, histogram, funnel	`data.analytics()` → ClickHouse
Full-table scan over > 1 M rows	`data.analytics()` → ClickHouse
Cross-model aggregation (reporting)	`data.analytics()` → ClickHouse

PostgreSQL (retrieve, list) is the authoritative source. ClickHouse is the analytics replica — always consistent within < 5 seconds.

Eventual Consistency

The CDC pipeline (PostgreSQL → Debezium → Kafka → ClickHouse) has a propagation latency of < 5 seconds under normal load.

Every analytics response includes meta.lastSyncAt — the ISO 8601 timestamp of the most recent CDC sync for the queried table. UIs can display a "Data as of HH:MM:SS" badge when lastSyncAt is more than a few seconds old.

{
  "data": [ ... ],
  "meta": {
    "lastSyncAt": "2026-04-15T10:29:55Z",
    "rowsScanned": 2400000,
    "durationMs": 42
  }
}

Aggregation Operations

Operation	`aggregate` value	SQL equivalent
Count records	`count`	`COUNT(*)`
Sum a numeric field	`sum`	`SUM(field)`
Average a numeric field	`avg`	`AVG(field)`
Minimum value	`min`	`MIN(field)`
Maximum value	`max`	`MAX(field)`

SDK

Count

import { kernel } from '@platform/sdk-core';

const result = await kernel.data().analytics({
  model: 'contacts',
  aggregate: 'count',
});

// result.data: [{ count: 48200 }]
// result.meta.lastSyncAt: '2026-04-15T10:29:55Z'

Sum with filter

const revenue = await kernel.data().analytics({
  model: 'deals',
  aggregate: 'sum',
  field: 'amount',
  filters: { status: 'eq.won' },
});
// { data: [{ sum: 1250000 }], meta: { lastSyncAt: '...' } }

Group by field

const byStatus = await kernel.data().analytics({
  model: 'contacts',
  aggregate: 'count',
  groupBy: 'status',
});
// {
//   data: [
//     { status: 'active',   count: 38000 },
//     { status: 'inactive', count: 10200 }
//   ],
//   meta: { lastSyncAt: '...' }
// }

Time-series (group by date truncation)

const dailySignups = await kernel.data().analytics({
  model: 'contacts',
  aggregate: 'count',
  groupBy: 'created_at',
  groupByInterval: 'day',
  filters: {
    created_at: 'gte.2026-01-01T00:00:00Z',
  },
  order: 'created_at.asc',
});
// {
//   data: [
//     { created_at: '2026-01-01', count: 143 },
//     { created_at: '2026-01-02', count: 221 },
//     ...
//   ]
// }

Multi-metric aggregation

const summary = await kernel.data().analytics({
  model: 'deals',
  metrics: [
    { aggregate: 'count' },
    { aggregate: 'sum',  field: 'amount' },
    { aggregate: 'avg',  field: 'amount' },
  ],
  filters: { stage: 'eq.closed_won' },
});
// {
//   data: [{ count: 412, sum: 8340000, avg: 20242 }],
//   meta: { lastSyncAt: '...' }
// }

REST Endpoint

POST https://api.septemcore.com/v1/data/crm/analytics
Authorization: Bearer <access_token>
Content-Type: application/json

{
  "model": "contacts",
  "aggregate": "count",
  "groupBy": "status",
  "filters": {
    "deleted_at": "is.null"
  }
}

Response 200 OK:

{
  "data": [
    { "status": "active",   "count": 38000 },
    { "status": "inactive", "count": 10200 }
  ],
  "meta": {
    "lastSyncAt": "2026-04-15T10:29:55Z",
    "rowsScanned": 48200,
    "durationMs": 17
  }
}

Tenant Isolation (Dual-Layer)

ClickHouse does not have a native RLS system comparable to PostgreSQL. The Data Layer enforces tenant isolation through two independent mechanisms:

Layer	Mechanism	When it fires
Application	Go Data Layer appends `WHERE tenant_id = $tenantId` to every query	Always — `tenant_id` from JWT, never from request params
Database	ClickHouse `CREATE ROW POLICY tenant_filter USING tenant_id = {tenantId}`	DB-level fence — fires even if Go code has a bug
Audit	All ClickHouse queries logged with `tenantId` + `traceId`	Forensic analysis, SOC2 compliance

Row policies are created automatically when a new tenant is provisioned.

ReplacingMergeTree and FINAL

Analytics tables use the ReplacingMergeTree engine. CDC delivers at-least-once, so duplicate rows can accumulate between background merges. The Data Layer always appends the FINAL modifier to every SELECT, forcing ClickHouse to deduplicate on read and return only the latest version of each row:

-- Generated internally — modules never write this directly
SELECT status, count(*) AS count
FROM crm_contacts FINAL
WHERE tenant_id = '01j9p3kz5f00000000000000'
  AND deleted_at IS NULL
GROUP BY status

FINAL guarantees correctness. The performance cost is acceptable for analytics workloads (single query, not per-row).

Required Permissions

Action	Permission
Any analytics query	`{module}.{model}.read`

Analytics endpoints follow the same permission model as CRUD reads. No separate analytics permission is needed.

Error Reference

Scenario	HTTP	`type` URI suffix
Unknown model	404	`not-found`
Unknown aggregate	400	`validation-error`
Unknown field in `groupBy`	400	`validation-error` (`code: UNKNOWN_FIELD`)
Insufficient permission	403	`forbidden`
ClickHouse query timeout (> 30 s)	504	`gateway-timeout`

When to Use Analytics vs CRUD​

Eventual Consistency​

Aggregation Operations​

SDK​

Count​

Sum with filter​

Group by field​

Time-series (group by date truncation)​

Multi-metric aggregation​

REST Endpoint​

Tenant Isolation (Dual-Layer)​

ReplacingMergeTree and FINAL​

Required Permissions​

Error Reference​