Audit Log — Overview
The Audit Log Service is the immutable compliance record for every
action on the platform. No module writes to the audit log directly —
all services call kernel.audit(). The service guarantees that every
audit record is persisted, even when Kafka or ClickHouse is temporarily
unavailable.
Fundamental property: Audit records are never deleted. The
deleted_atsoft-delete column exists for GDPR anonymization workflow only. Physical deletion from ClickHouse is not supported.
Technical Stack
| Component | Technology | Role |
|---|---|---|
| Go runtime | ClickHouse/clickhouse-go | ClickHouse write and query client |
| Primary store | ClickHouse (ReplacingMergeTree) | Append-only, high-throughput write, index-only scans |
| Message queue | Kafka (segmentio/kafka-go) — topic platform.audit.events | Decouples producers from ClickHouse, buffers during outages |
| Fallback store | PostgreSQL — table audit_wal | WAL fallback when Kafka is unavailable |
| SDK package | @platform/sdk-audit | TypeScript client for all modules |
Internal Packages
| Package | Responsibility |
|---|---|
internal/config | Environment configuration |
internal/handler | HTTP request handlers (REST API) |
internal/kafka | Kafka consumer — reads platform.audit.events, writes to ClickHouse |
internal/model | Audit record domain model |
internal/repository | ClickHouse repository (read + write) |
internal/service | Business logic — routing, validation, anonymization |
internal/wal | PostgreSQL WAL fallback — write and background replay |
Dual-Write Guarantee
The Audit Service guarantees that no audit record is ever lost, regardless of which component fails.
Normal path:
Service → Kafka (platform.audit.events) → ClickHouse consumer → ClickHouse
ClickHouse down:
Kafka stores events for up to 7 days.
Consumer waits. When ClickHouse recovers → consumer catches up automatically.
Kafka down (fallback path):
Service → PostgreSQL WAL table (audit_wal)
Background goroutine (ticker: 30s) → replay audit_wal → Kafka (on recovery)
WAL retention: 7 days
Replayed records: deleted from audit_wal after successful Kafka publish
Why Non-Blocking
kernel.audit().record() is asynchronous and non-blocking. The
calling service does not wait for ClickHouse confirmation. If the
audit call itself fails (network error before reaching Kafka), the
WAL fallback picks it up. A business operation (Money debit, IAM
login) is never blocked by an audit failure.
| Step | Blocking? | Fallback |
|---|---|---|
| Service → Kafka publish | No | WAL fallback if Kafka unreachable |
| Kafka → ClickHouse consumer | No | Kafka 7-day buffer |
| WAL → Kafka replay | Background (30s ticker) | WAL 7-day retention |
SOX / PCI-DSS compliance: 100% audit trail. The dual-write architecture ensures no record is lost even in a multi-component failure scenario.
ClickHouse Schema Design
Table Engine
CREATE TABLE audit_logs (
id UUID,
tenant_id String,
action String,
entity_type String,
entity_id String,
user_id String,
ip String,
user_agent String,
before String, -- JSON snapshot
after String, -- JSON snapshot
timestamp DateTime64(3, 'UTC'),
deleted_at Nullable(DateTime64(3, 'UTC'))
) ENGINE = ReplacingMergeTree(timestamp)
PARTITION BY (toYYYYMM(timestamp), tenant_id)
ORDER BY (tenant_id, action, timestamp);
Why ReplacingMergeTree
ReplacingMergeTree allows the GDPR anonymization workflow to
"replace" a record by inserting a new version with [REDACTED]
PII. The engine merges duplicates in the background, keeping the
latest version. All queries must use the FINAL modifier to
guarantee only the latest version is returned:
SELECT * FROM audit_logs FINAL
WHERE tenant_id = 'tenant-01j...'
AND action = 'user.login'
ORDER BY timestamp DESC
LIMIT 20;
Without FINAL, ClickHouse may return both the original and the
anonymized version simultaneously (merge is a background process).
The Audit Service query layer adds FINAL automatically to every
SELECT.
Partition and Index
| Design decision | Value | Rationale |
|---|---|---|
| Partition key | (toYYYYMM(timestamp), tenant_id) | ClickHouse prunes by both fields |
| Sort / index key | (tenant_id, action, timestamp) | Query by tenant + action = index-only scan |
| Schema migrations | ADD COLUMN only | Lightweight metadata change, non-blocking |
| Type changes | New table + async backfill | No blocking ALTER TABLE TYPE |
REST API
| Method | Endpoint | Description |
|---|---|---|
POST | /api/v1/audit | Record a single audit entry |
POST | /api/v1/audit/batch | Record multiple entries in one request |
GET | /api/v1/audit/:id | Retrieve a specific record by ID |
GET | /api/v1/audit | Search records (filters + pagination) |
GET | /api/v1/audit/entity/:type/:id | Entity history (all actions on one entity) |
GET | /api/v1/audit/export | Export records to JSON or CSV |
POST | /api/v1/audit/anonymize | GDPR anonymization for a userId |
What Must Be Audited
Every module must audit the following categories. This is a platform requirement, not a recommendation:
| Category | Examples |
|---|---|
| Financial transactions | Credit, debit, hold, confirm, cancel, reversal |
| Admin actions | Commission changes, account suspension, plan changes |
| API integrations | Postbacks, webhooks, external API calls |
| RBAC changes | Role assignment, permission grant/revoke |
| Tracking events | Registration, first deposit (FTD), login |
| Configuration changes | Feature flags, module registry, module enable/disable |
Related Pages
- Recording —
record(),recordBatch(), async non-blocking, mandatory audit categories - Querying — search filters, entity history, export,
ReplacingMergeTree FINAL - Retention — hot 90-day ClickHouse, cold 7-year S3 Glacier, LZ4 compression
- GDPR Anonymization —
POST /audit/anonymize, appendANONYMIZErecord, SELECT FINAL, cold rebuild