Skip to main content

Data Flow

This page documents the runtime data flows through Platform-Kernel — how a request travels from a browser to a database, how events propagate between services, and how analytical data reaches ClickHouse.

Depth: For service inventory see Service Map. For CDC internals see CDC Pipeline.


1. Authenticated Request Lifecycle

This is the critical path for every API call. The Gateway enforces authentication and authorization before any downstream service sees the request.

Key points:

StepDetail
Rate limitingTwo-tier: Envoy local token-bucket (zero-latency) → Valkey global counter (cross-instance)
JWT validationES256 asymmetric; public key cached in Gateway memory; no IAM gRPC call per request
RBACO(1) Valkey set lookup; permissions cached under rbac:{tenantId}:{userId}
Circuit breakerOpens after 5 consecutive failures in 30s; returns 503 immediately during open
RLSPostgreSQL Row-Level Security enforces tenant_id = current_setting('app.tenant_id')

2. Login Flow (JWT Issuance)

Token Grace Window (multi-tab refresh race):

Problem : Two browser tabs simultaneously see expired access token
→ both call POST /auth/refresh with the same refresh token
→ first tab gets new pair, second tab gets 401 (token revoked)

Solution: Refresh token stays valid for 10s after first use (grace window)
Within grace window: same refresh token → same new token pair (idempotent)
After 10s: permanently revoked
SDK also uses BroadcastChannel to coordinate in-flight refresh across tabs

3. Domain Event Flow

Every state mutation in any core service publishes a typed event to Kafka. Consumers process asynchronously with at-least-once delivery.

Event deduplication:

Event domainDedup mechanismTTL
money.*PostgreSQL processed_event_ids tablePermanent (no TTL)
auth.*, module.*Valkey SET {domain}:{event_id} 1 EX 8640024 hours
DLQ retryEventService.DLQRetry RPC requeues to original topicN/A — explicit

4. CDC Pipeline (PostgreSQL → ClickHouse)

The Change Data Capture pipeline propagates all tenant data mutations from PostgreSQL to ClickHouse for analytical queries.

At-least-once → exactly-once deduplication in ClickHouse:

Table engine : ReplacingMergeTree(updated_at)
Ordering key : (tenant_id, record_id)
Dedup : On INSERT — Kafka consumer produces idempotent row
On QUERY — SELECT ... FINAL collapses duplicates at read time
Background : OPTIMIZE TABLE ... FINAL on schedule merges rows

WAL retention : 24h (KAFKA_RETENTION_HOURS=168 covers any consumer lag)
Bloat guard : Alert if WAL > 50GB (CDC_WAL_BLOAT_ALERT_GB=50)
Recovery : Auto-snapshot on consumer lag spike; lastSyncAt checkpoint

5. Money Hold/Confirm Pattern

Financial operations use a two-phase commit pattern inside PostgreSQL to prevent over-spending and ensure consistency without distributed transactions.

Expired hold cleanup:

A background job (Money Service ticker, interval: 60s) scans:
SELECT * FROM transactions WHERE status='held' AND hold_expires_at < NOW()

For each expired hold:
BEGIN TX
UPDATE wallets SET available += amount, frozen -= amount
UPDATE transactions SET status='expired'
COMMIT
Publish money.hold.expired → Kafka

6. WebSocket Notification Push

Real-time notifications travel from a publishing service through Kafka / RabbitMQ to the browser via the Notify Service WebSocket hub.

Connection limits:
Max connections per tenant : 1000 (NOTIFY_WS_MAX_CONNECTIONS_PER_TENANT)
Max broadcast rate : 200 msg/s per tenant (NOTIFY_WS_RATE_PER_TENANT)
Heartbeat : Server Ping every 30s; 2 missed Pongs → close(4408)
Replay on reconnect : Valkey LPUSH/LRANGE ws:replay:{tenantId}:{channel}
Buffer: 100 messages per channel (LIFO)

7. Module Installation Flow