GDPR Anonymization
GDPR anonymization in the Audit Log Service satisfies the Right to Be
Forgotten while preserving financial audit records required by AML/KYC
law. The implementation is append-only: original records are never
modified or deleted. Anonymization produces a new record that
supersedes the original via ClickHouse's ReplacingMergeTree engine.
Design Principles
| Principle | Implementation |
|---|---|
| Immutability preserved | Original records are never modified. Anonymization appends a new record. |
| GDPR satisfied | PII is inaccessible via API and in cold storage after anonymization. |
| AML/KYC satisfied | Financial records (money.*) are never anonymized. |
| Deletion forbidden | Physical deletion from Glacier is not permitted. Anonymization only. |
| Export guarantee | GET /audit/export waits for pending anonymizations before streaming. |
How It Works
Step 1: POST /audit/anonymize { userId }
→ Synchronous call (waits for ClickHouse INSERT confirmation)
→ Valkey lock SET is acquired: anonymizing:{userId} EX 60 NX
Step 2: For each audit_log record where userId matches:
→ Insert new ANONYMIZE record:
email → [REDACTED]
ip → 0.0.0.0
name → [REDACTED]
user_agent→ [REDACTED]
Step 3: ClickHouse ReplacingMergeTree merges in background.
SELECT FINAL → returns only the latest (anonymized) version.
Step 4: Valkey lock released (after ClickHouse INSERT confirmed).
Any concurrent SELECT that was blocked → now sees ANONYMIZE record.
Step 5: userId written to PostgreSQL anonymization_log table.
Used for cold storage (Glacier) rebuild.
API
POST https://api.septemcore.com/v1/audit/anonymize
Authorization: Bearer <access_token>
Content-Type: application/json
{
"userId": "01j9pa5mz700000000000000"
}
This is a synchronous call — the response is returned only after
the ANONYMIZE records have been confirmed as inserted in ClickHouse.
Do not treat this as fire-and-forget.
Response 200 OK:
{
"userId": "01j9pa5mz700000000000000",
"recordsAnonymized": 1842,
"status": "completed",
"completedAt": "2026-04-15T10:30:05.123Z"
}
SDK:
const result = await kernel.audit().anonymize({
userId: '01j9pa5mz700000000000000',
});
// result.recordsAnonymized: 1842
// result.status: 'completed'
PII Replacement Map
| Field | Original value | After anonymization |
|---|---|---|
email | [email protected] | [REDACTED] |
ip | 192.0.2.10 | 0.0.0.0 |
name | Alice Wonderland | [REDACTED] |
userAgent | Mozilla/5.0 (Macintosh...) | [REDACTED] |
userId | 01j9pa5mz700000000000000 | unchanged — needed for record correlation |
entityId | value unchanged | unchanged — audit trail integrity |
before / after | may contain PII | unchanged — module's responsibility to exclude PII |
Module responsibility: The Audit Service does not scan
before/afterJSON for PII. If a module recorded PII in these fields, it is not automatically redacted by the anonymization workflow. Modules must avoid placing PII inbefore/aftersnapshots.
Concurrency Control: Valkey Lock
The anonymization request sets a Valkey lock before inserting ANONYMIZE records. This prevents concurrent queries from reading stale (non-anonymized) data during the anonymization window:
Anonymization request arrives:
SET anonymizing:{userId} 1 EX 60 NX
→ acquired (no other anonymization in progress)
Concurrent GET /audit?userId={userId} arrives:
→ Query layer checks: GET anonymizing:{userId}
→ Lock present → wait for release (poll 100ms, max 5s)
→ Lock released → execute SELECT FINAL → returns anonymized records
Anonymization completes:
ClickHouse INSERT confirmed → Valkey lock deleted
→ All waiting queries resume
| Parameter | Value |
|---|---|
| Lock key | anonymizing:{userId} |
| Lock TTL | 60 seconds (failsafe — released manually after INSERT) |
| Polling interval | 100 ms |
| Max wait | 5 seconds — then 503 ANONYMIZATION_IN_PROGRESS |
| Concurrent anonymization for same userId | Rejected (409 Conflict) |
ReplacingMergeTree and SELECT FINAL
The ANONYMIZE record has the same primary key as the original record
(same (tenant_id, action, timestamp) components) but a later
timestamp. ClickHouse ReplacingMergeTree(timestamp) keeps only
the latest version during background merges.
All queries in the Audit Service query layer use FINAL:
SELECT * FROM audit_logs FINAL
WHERE user_id = '01j9pa5mz700000000000000'
ORDER BY timestamp DESC;
With FINAL, ClickHouse deduplicates at query time, returning only
the ANONYMIZE record. Without FINAL, both the original and the
ANONYMIZE record may appear in the result set until the background
merge occurs.
Cold Storage: Monthly Rebuild
When POST /audit/anonymize is called, the Audit Service writes
the userId to the PostgreSQL anonymization_log table:
-- PostgreSQL
INSERT INTO anonymization_log (user_id, requested_at)
VALUES ('01j9pa5mz700000000000000', NOW());
A monthly background job rebuilds cold segments in S3 Glacier to ensure anonymization is also applied to records that were already tiered out of ClickHouse:
Monthly cold rebuild job:
1. Read all entries from anonymization_log (PostgreSQL)
2. For each userId with records in Glacier:
a. Restore Glacier segment to S3 Standard (~3–12 hours)
b. Apply anonymization_log: replace PII with [REDACTED]
c. Write clean version back to Glacier
d. Delete temporary S3 Standard copy
3. Log rebuild completion in audit_cold_rebuilds table
This guarantees that GDPR anonymization is eventually consistent in cold storage — not just in hot ClickHouse.
AML / KYC Exception: Financial Records
Financial audit records are never anonymized, regardless of any GDPR request. This is a hard exception enforced at the service layer:
| Action prefix | Anonymizable? | Reason |
|---|---|---|
money.* | ❌ No | AML/KYC compliance — must retain 7 years |
billing.* | ❌ No | Financial compliance |
user.* | ✅ Yes | Standard GDPR |
role.* | ✅ Yes | Standard GDPR |
files.* | ✅ Yes | Standard GDPR |
notify.* | ✅ Yes | Standard GDPR |
When POST /audit/anonymize is called for a userId, the Audit
Service skips all records with action prefixes on the exclusion
list. These records remain in their original form forever.
Export and Pending Anonymizations
GET /audit/export checks for pending anonymizations before
streaming the export. If any anonymization is in progress for the
queried tenant, the export waits:
GET /audit/export:
1. Scan Valkey for any anonymizing:{userId} keys for this tenant
┌─ None found → stream export immediately
└─ Found → wait for lock release (poll 100ms, max 5s)
→ stream export only after all locks released
This guarantees that an export initiated immediately after a GDPR request does not contain any non-anonymized PII.
Error Reference
| Scenario | HTTP | Code |
|---|---|---|
| Anonymization already in progress for this userId | 409 | ANONYMIZATION_IN_PROGRESS |
| Concurrent SELECT blocked by lock, max wait exceeded | 503 | ANONYMIZATION_IN_PROGRESS — retry in 10s |
| Export blocked by pending anonymization, max wait exceeded | 503 | ANONYMIZATION_PENDING — retry in 10s |
userId not found in any audit record | 200 | recordsAnonymized: 0 (idempotent success) |