SeptemCore LogoSeptemCore
PrimitivesAudit

Audit Log — Overview

Audit Log Service overview: immutable append-only records in ClickHouse (ReplacingMergeTree). Dual-write guarantee: Kafka primary + PostgreSQL WAL fallback. SOX/PCI-DSS compliance. 7 internal packages. 7 REST endpoints. Retention: 90 days hot (ClickHouse), up to 7 years cold (S3 Glacier).

The Audit Log Service is the immutable compliance record for every action on the platform. No module writes to the audit log directly — all services call kernel.audit(). The service guarantees that every audit record is persisted, even when Kafka or ClickHouse is temporarily unavailable.

Fundamental property: Audit records are never deleted. The deleted_at soft-delete column exists for GDPR anonymization workflow only. Physical deletion from ClickHouse is not supported.


Technical Stack

ComponentTechnologyRole
Go runtimeClickHouse/clickhouse-goClickHouse write and query client
Primary storeClickHouse (ReplacingMergeTree)Append-only, high-throughput write, index-only scans
Message queueKafka (segmentio/kafka-go) — topic platform.audit.eventsDecouples producers from ClickHouse, buffers during outages
Fallback storePostgreSQL — table audit_walWAL fallback when Kafka is unavailable
SDK package@platform/sdk-auditTypeScript client for all modules

Internal Packages

PackageResponsibility
internal/configEnvironment configuration
internal/handlerHTTP request handlers (REST API)
internal/kafkaKafka consumer — reads platform.audit.events, writes to ClickHouse
internal/modelAudit record domain model
internal/repositoryClickHouse repository (read + write)
internal/serviceBusiness logic — routing, validation, anonymization
internal/walPostgreSQL WAL fallback — write and background replay

Dual-Write Guarantee

The Audit Service guarantees that no audit record is ever lost, regardless of which component fails.

Normal path:
  Service → Kafka (platform.audit.events) → ClickHouse consumer → ClickHouse

ClickHouse down:
  Kafka stores events for up to 7 days.
  Consumer waits. When ClickHouse recovers → consumer catches up automatically.

Kafka down (fallback path):
  Service → PostgreSQL WAL table (audit_wal)
  Background goroutine (ticker: 30s) → replay audit_wal → Kafka (on recovery)
  WAL retention: 7 days
  Replayed records: deleted from audit_wal after successful Kafka publish

Why Non-Blocking

kernel.audit().record() is asynchronous and non-blocking. The calling service does not wait for ClickHouse confirmation. If the audit call itself fails (network error before reaching Kafka), the WAL fallback picks it up. A business operation (Money debit, IAM login) is never blocked by an audit failure.

StepBlocking?Fallback
Service → Kafka publishNoWAL fallback if Kafka unreachable
Kafka → ClickHouse consumerNoKafka 7-day buffer
WAL → Kafka replayBackground (30s ticker)WAL 7-day retention

SOX / PCI-DSS compliance: 100% audit trail. The dual-write architecture ensures no record is lost even in a multi-component failure scenario.


ClickHouse Schema Design

Table Engine

CREATE TABLE audit_logs (
  id          UUID,
  tenant_id   String,
  action      String,
  entity_type String,
  entity_id   String,
  user_id     String,
  ip          String,
  user_agent  String,
  before      String,   -- JSON snapshot
  after       String,   -- JSON snapshot
  timestamp   DateTime64(3, 'UTC'),
  deleted_at  Nullable(DateTime64(3, 'UTC'))
) ENGINE = ReplacingMergeTree(timestamp)
PARTITION BY (toYYYYMM(timestamp), tenant_id)
ORDER BY (tenant_id, action, timestamp);

Why ReplacingMergeTree

ReplacingMergeTree allows the GDPR anonymization workflow to "replace" a record by inserting a new version with [REDACTED] PII. The engine merges duplicates in the background, keeping the latest version. All queries must use the FINAL modifier to guarantee only the latest version is returned:

SELECT * FROM audit_logs FINAL
WHERE tenant_id = 'tenant-01j...'
  AND action = 'user.login'
ORDER BY timestamp DESC
LIMIT 20;

Without FINAL, ClickHouse may return both the original and the anonymized version simultaneously (merge is a background process). The Audit Service query layer adds FINAL automatically to every SELECT.

Partition and Index

Design decisionValueRationale
Partition key(toYYYYMM(timestamp), tenant_id)ClickHouse prunes by both fields
Sort / index key(tenant_id, action, timestamp)Query by tenant + action = index-only scan
Schema migrationsADD COLUMN onlyLightweight metadata change, non-blocking
Type changesNew table + async backfillNo blocking ALTER TABLE TYPE

REST API

MethodEndpointDescription
POST/api/v1/auditRecord a single audit entry
POST/api/v1/audit/batchRecord multiple entries in one request
GET/api/v1/audit/:idRetrieve a specific record by ID
GET/api/v1/auditSearch records (filters + pagination)
GET/api/v1/audit/entity/:type/:idEntity history (all actions on one entity)
GET/api/v1/audit/exportExport records to JSON or CSV
POST/api/v1/audit/anonymizeGDPR anonymization for a userId

What Must Be Audited

Every module must audit the following categories. This is a platform requirement, not a recommendation:

CategoryExamples
Financial transactionsCredit, debit, hold, confirm, cancel, reversal
Admin actionsCommission changes, account suspension, plan changes
API integrationsPostbacks, webhooks, external API calls
RBAC changesRole assignment, permission grant/revoke
Tracking eventsRegistration, first deposit (FTD), login
Configuration changesFeature flags, module registry, module enable/disable

  • Recordingrecord(), recordBatch(), async non-blocking, mandatory audit categories
  • Querying — search filters, entity history, export, ReplacingMergeTree FINAL
  • Retention — hot 90-day ClickHouse, cold 7-year S3 Glacier, LZ4 compression
  • GDPR AnonymizationPOST /audit/anonymize, append ANONYMIZE record, SELECT FINAL, cold rebuild

On this page