Skip to main content

GDPR Anonymization

GDPR anonymization in the Audit Log Service satisfies the Right to Be Forgotten while preserving financial audit records required by AML/KYC law. The implementation is append-only: original records are never modified or deleted. Anonymization produces a new record that supersedes the original via ClickHouse's ReplacingMergeTree engine.


Design Principles

PrincipleImplementation
Immutability preservedOriginal records are never modified. Anonymization appends a new record.
GDPR satisfiedPII is inaccessible via API and in cold storage after anonymization.
AML/KYC satisfiedFinancial records (money.*) are never anonymized.
Deletion forbiddenPhysical deletion from Glacier is not permitted. Anonymization only.
Export guaranteeGET /audit/export waits for pending anonymizations before streaming.

How It Works

Step 1: POST /audit/anonymize { userId }
→ Synchronous call (waits for ClickHouse INSERT confirmation)
→ Valkey lock SET is acquired: anonymizing:{userId} EX 60 NX

Step 2: For each audit_log record where userId matches:
→ Insert new ANONYMIZE record:
email → [REDACTED]
ip → 0.0.0.0
name → [REDACTED]
user_agent→ [REDACTED]

Step 3: ClickHouse ReplacingMergeTree merges in background.
SELECT FINAL → returns only the latest (anonymized) version.

Step 4: Valkey lock released (after ClickHouse INSERT confirmed).
Any concurrent SELECT that was blocked → now sees ANONYMIZE record.

Step 5: userId written to PostgreSQL anonymization_log table.
Used for cold storage (Glacier) rebuild.

API

POST https://api.septemcore.com/v1/audit/anonymize
Authorization: Bearer <access_token>
Content-Type: application/json

{
"userId": "01j9pa5mz700000000000000"
}

This is a synchronous call — the response is returned only after the ANONYMIZE records have been confirmed as inserted in ClickHouse. Do not treat this as fire-and-forget.

Response 200 OK:

{
"userId": "01j9pa5mz700000000000000",
"recordsAnonymized": 1842,
"status": "completed",
"completedAt": "2026-04-15T10:30:05.123Z"
}

SDK:

const result = await kernel.audit().anonymize({
userId: '01j9pa5mz700000000000000',
});

// result.recordsAnonymized: 1842
// result.status: 'completed'

PII Replacement Map

FieldOriginal valueAfter anonymization
email[email protected][REDACTED]
ip192.0.2.100.0.0.0
nameAlice Wonderland[REDACTED]
userAgentMozilla/5.0 (Macintosh...)[REDACTED]
userId01j9pa5mz700000000000000unchanged — needed for record correlation
entityIdvalue unchangedunchanged — audit trail integrity
before / aftermay contain PIIunchanged — module's responsibility to exclude PII

Module responsibility: The Audit Service does not scan before/after JSON for PII. If a module recorded PII in these fields, it is not automatically redacted by the anonymization workflow. Modules must avoid placing PII in before/after snapshots.


Concurrency Control: Valkey Lock

The anonymization request sets a Valkey lock before inserting ANONYMIZE records. This prevents concurrent queries from reading stale (non-anonymized) data during the anonymization window:

Anonymization request arrives:
SET anonymizing:{userId} 1 EX 60 NX
→ acquired (no other anonymization in progress)

Concurrent GET /audit?userId={userId} arrives:
→ Query layer checks: GET anonymizing:{userId}
→ Lock present → wait for release (poll 100ms, max 5s)
→ Lock released → execute SELECT FINAL → returns anonymized records

Anonymization completes:
ClickHouse INSERT confirmed → Valkey lock deleted
→ All waiting queries resume
ParameterValue
Lock keyanonymizing:{userId}
Lock TTL60 seconds (failsafe — released manually after INSERT)
Polling interval100 ms
Max wait5 seconds — then 503 ANONYMIZATION_IN_PROGRESS
Concurrent anonymization for same userIdRejected (409 Conflict)

ReplacingMergeTree and SELECT FINAL

The ANONYMIZE record has the same primary key as the original record (same (tenant_id, action, timestamp) components) but a later timestamp. ClickHouse ReplacingMergeTree(timestamp) keeps only the latest version during background merges.

All queries in the Audit Service query layer use FINAL:

SELECT * FROM audit_logs FINAL
WHERE user_id = '01j9pa5mz700000000000000'
ORDER BY timestamp DESC;

With FINAL, ClickHouse deduplicates at query time, returning only the ANONYMIZE record. Without FINAL, both the original and the ANONYMIZE record may appear in the result set until the background merge occurs.


Cold Storage: Monthly Rebuild

When POST /audit/anonymize is called, the Audit Service writes the userId to the PostgreSQL anonymization_log table:

-- PostgreSQL
INSERT INTO anonymization_log (user_id, requested_at)
VALUES ('01j9pa5mz700000000000000', NOW());

A monthly background job rebuilds cold segments in S3 Glacier to ensure anonymization is also applied to records that were already tiered out of ClickHouse:

Monthly cold rebuild job:
1. Read all entries from anonymization_log (PostgreSQL)
2. For each userId with records in Glacier:
a. Restore Glacier segment to S3 Standard (~3–12 hours)
b. Apply anonymization_log: replace PII with [REDACTED]
c. Write clean version back to Glacier
d. Delete temporary S3 Standard copy
3. Log rebuild completion in audit_cold_rebuilds table

This guarantees that GDPR anonymization is eventually consistent in cold storage — not just in hot ClickHouse.


AML / KYC Exception: Financial Records

Financial audit records are never anonymized, regardless of any GDPR request. This is a hard exception enforced at the service layer:

Action prefixAnonymizable?Reason
money.*❌ NoAML/KYC compliance — must retain 7 years
billing.*❌ NoFinancial compliance
user.*✅ YesStandard GDPR
role.*✅ YesStandard GDPR
files.*✅ YesStandard GDPR
notify.*✅ YesStandard GDPR

When POST /audit/anonymize is called for a userId, the Audit Service skips all records with action prefixes on the exclusion list. These records remain in their original form forever.


Export and Pending Anonymizations

GET /audit/export checks for pending anonymizations before streaming the export. If any anonymization is in progress for the queried tenant, the export waits:

GET /audit/export:
1. Scan Valkey for any anonymizing:{userId} keys for this tenant
┌─ None found → stream export immediately
└─ Found → wait for lock release (poll 100ms, max 5s)
→ stream export only after all locks released

This guarantees that an export initiated immediately after a GDPR request does not contain any non-anonymized PII.


Error Reference

ScenarioHTTPCode
Anonymization already in progress for this userId409ANONYMIZATION_IN_PROGRESS
Concurrent SELECT blocked by lock, max wait exceeded503ANONYMIZATION_IN_PROGRESS — retry in 10s
Export blocked by pending anonymization, max wait exceeded503ANONYMIZATION_PENDING — retry in 10s
userId not found in any audit record200recordsAnonymized: 0 (idempotent success)