Skip to main content

Audit Log — Overview

The Audit Log Service is the immutable compliance record for every action on the platform. No module writes to the audit log directly — all services call kernel.audit(). The service guarantees that every audit record is persisted, even when Kafka or ClickHouse is temporarily unavailable.

Fundamental property: Audit records are never deleted. The deleted_at soft-delete column exists for GDPR anonymization workflow only. Physical deletion from ClickHouse is not supported.


Technical Stack

ComponentTechnologyRole
Go runtimeClickHouse/clickhouse-goClickHouse write and query client
Primary storeClickHouse (ReplacingMergeTree)Append-only, high-throughput write, index-only scans
Message queueKafka (segmentio/kafka-go) — topic platform.audit.eventsDecouples producers from ClickHouse, buffers during outages
Fallback storePostgreSQL — table audit_walWAL fallback when Kafka is unavailable
SDK package@platform/sdk-auditTypeScript client for all modules

Internal Packages

PackageResponsibility
internal/configEnvironment configuration
internal/handlerHTTP request handlers (REST API)
internal/kafkaKafka consumer — reads platform.audit.events, writes to ClickHouse
internal/modelAudit record domain model
internal/repositoryClickHouse repository (read + write)
internal/serviceBusiness logic — routing, validation, anonymization
internal/walPostgreSQL WAL fallback — write and background replay

Dual-Write Guarantee

The Audit Service guarantees that no audit record is ever lost, regardless of which component fails.

Normal path:
Service → Kafka (platform.audit.events) → ClickHouse consumer → ClickHouse

ClickHouse down:
Kafka stores events for up to 7 days.
Consumer waits. When ClickHouse recovers → consumer catches up automatically.

Kafka down (fallback path):
Service → PostgreSQL WAL table (audit_wal)
Background goroutine (ticker: 30s) → replay audit_wal → Kafka (on recovery)
WAL retention: 7 days
Replayed records: deleted from audit_wal after successful Kafka publish

Why Non-Blocking

kernel.audit().record() is asynchronous and non-blocking. The calling service does not wait for ClickHouse confirmation. If the audit call itself fails (network error before reaching Kafka), the WAL fallback picks it up. A business operation (Money debit, IAM login) is never blocked by an audit failure.

StepBlocking?Fallback
Service → Kafka publishNoWAL fallback if Kafka unreachable
Kafka → ClickHouse consumerNoKafka 7-day buffer
WAL → Kafka replayBackground (30s ticker)WAL 7-day retention

SOX / PCI-DSS compliance: 100% audit trail. The dual-write architecture ensures no record is lost even in a multi-component failure scenario.


ClickHouse Schema Design

Table Engine

CREATE TABLE audit_logs (
id UUID,
tenant_id String,
action String,
entity_type String,
entity_id String,
user_id String,
ip String,
user_agent String,
before String, -- JSON snapshot
after String, -- JSON snapshot
timestamp DateTime64(3, 'UTC'),
deleted_at Nullable(DateTime64(3, 'UTC'))
) ENGINE = ReplacingMergeTree(timestamp)
PARTITION BY (toYYYYMM(timestamp), tenant_id)
ORDER BY (tenant_id, action, timestamp);

Why ReplacingMergeTree

ReplacingMergeTree allows the GDPR anonymization workflow to "replace" a record by inserting a new version with [REDACTED] PII. The engine merges duplicates in the background, keeping the latest version. All queries must use the FINAL modifier to guarantee only the latest version is returned:

SELECT * FROM audit_logs FINAL
WHERE tenant_id = 'tenant-01j...'
AND action = 'user.login'
ORDER BY timestamp DESC
LIMIT 20;

Without FINAL, ClickHouse may return both the original and the anonymized version simultaneously (merge is a background process). The Audit Service query layer adds FINAL automatically to every SELECT.

Partition and Index

Design decisionValueRationale
Partition key(toYYYYMM(timestamp), tenant_id)ClickHouse prunes by both fields
Sort / index key(tenant_id, action, timestamp)Query by tenant + action = index-only scan
Schema migrationsADD COLUMN onlyLightweight metadata change, non-blocking
Type changesNew table + async backfillNo blocking ALTER TABLE TYPE

REST API

MethodEndpointDescription
POST/api/v1/auditRecord a single audit entry
POST/api/v1/audit/batchRecord multiple entries in one request
GET/api/v1/audit/:idRetrieve a specific record by ID
GET/api/v1/auditSearch records (filters + pagination)
GET/api/v1/audit/entity/:type/:idEntity history (all actions on one entity)
GET/api/v1/audit/exportExport records to JSON or CSV
POST/api/v1/audit/anonymizeGDPR anonymization for a userId

What Must Be Audited

Every module must audit the following categories. This is a platform requirement, not a recommendation:

CategoryExamples
Financial transactionsCredit, debit, hold, confirm, cancel, reversal
Admin actionsCommission changes, account suspension, plan changes
API integrationsPostbacks, webhooks, external API calls
RBAC changesRole assignment, permission grant/revoke
Tracking eventsRegistration, first deposit (FTD), login
Configuration changesFeature flags, module registry, module enable/disable

  • Recordingrecord(), recordBatch(), async non-blocking, mandatory audit categories
  • Querying — search filters, entity history, export, ReplacingMergeTree FINAL
  • Retention — hot 90-day ClickHouse, cold 7-year S3 Glacier, LZ4 compression
  • GDPR AnonymizationPOST /audit/anonymize, append ANONYMIZE record, SELECT FINAL, cold rebuild