Tenant Isolation
Tenant isolation is the most critical security property of Platform-Kernel. One tenant must never see another tenant's data — regardless of bugs, misconfiguration, or malicious module code.
The platform enforces isolation at six independent layers. A vulnerability in any single layer does not cause a cross-tenant data exposure because the other five layers remain intact.
Isolation Overview
Layer 1 — PostgreSQL Row-Level Security
Every table in Platform-Kernel has a tenant_id UUID NOT NULL column.
PostgreSQL Row-Level Security (RLS) is enabled on every table and enforced
at the database engine level — not in application code.
-- Applied to every tenant-scoped table
ALTER TABLE {table_name} ENABLE ROW LEVEL SECURITY;
ALTER TABLE {table_name} FORCE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON {table_name}
USING (tenant_id = current_setting('app.tenant_id')::uuid);
How app.tenant_id is set:
Before any query, the Go Data Layer executes:
SET LOCAL app.tenant_id = '{tenantId}';
-- Then executes the actual query within the same transaction
SET LOCAL is scoped to the current transaction. If multiple connections
are pooled (pgBouncer), the setting is reset on connection return —
cross-leaking is architecturally impossible.
What FORCE ROW LEVEL SECURITY does:
Table owners (the PostgreSQL superuser used for migrations) are normally
exempt from RLS. FORCE RLS removes this exemption — even the migration
user cannot bypass the policy at runtime.
| Protection | Without RLS | With RLS |
|---|---|---|
Bug in Go code forgets WHERE tenant_id | Cross-tenant data returned | Database rejects query — returns empty set |
SQL injection bypasses WHERE clause | All tenant data exposed | RLS still filters by app.tenant_id |
| Malicious module calls Data Layer directly | Could access other tenants | Impossible — policy applied at DB engine |
Layer 2 — ClickHouse Dual Filter
ClickHouse does not have native Row-Level Security comparable to PostgreSQL. The platform applies two independent filters so that a bug in either one does not cause data exposure.
Application Filter (Go Data Layer)
// All ClickHouse queries go through this builder — tenantId is mandatory
func (r *AnalyticsRepo) Query(ctx context.Context, q AnalyticsQuery) (*Result, error) {
tenantID := auth.TenantIDFromContext(ctx) // extracted from JWT
sql := fmt.Sprintf(
"SELECT %s FROM %s WHERE tenant_id = ? AND %s",
q.Select, q.Table, q.Where,
)
return r.ch.Query(ctx, sql, tenantID, q.Args...)
}
Database-Level Row Policy
-- Created automatically when a new tenant is provisioned
CREATE ROW POLICY tenant_filter_{tenantId}
ON analytics.data_records
FOR SELECT
USING tenant_id = '{tenantId}'
TO tenant_user_{tenantId};
Each tenant has a dedicated ClickHouse user with this policy attached. The Go Data Layer connects as the tenant's user — even a completely correct query cannot return another tenant's rows.
All ClickHouse queries are logged with tenantId and traceId for
forensic analysis (SOC 2 compliance).
Layer 3 — Kafka Consumer SDK Filter
Kafka topics are shared across all tenants (one topic per domain). A naive consumer would receive every tenant's events. The SDK enforces tenant isolation at the subscription level.
Publish-side isolation:
When a module publishes an event, the Gateway injects tenantId
from the JWT — the module cannot set or override it:
Module: kernel.events().publish("order.created", { ... })
↓
SDK: EventEnvelope { tenantId: JWT.tenantId (injected), payload: { ... } }
↓
Gateway: Validates tenantId matches JWT — rejects if mismatch
RBAC on events:
- A module can only subscribe to events declared in
manifest.events.subscribes[]. - A module can only publish events declared in
manifest.events.publishes[]. - Kernel-owned events (
auth.*,money.*,billing.*) are on a hardcoded whitelist — only core services can publish them.
Layer 4 — WebSocket Channel Namespace
WebSocket connections are managed by the Notify Service. The hub uses
a compound key that includes tenantId as a mandatory prefix.
Internal hub key: {tenantId}:{channel}
Example:
Module calls: kernel.notify().broadcast("dashboard.metrics", data)
SDK sends: subscribe { channel: "dashboard.metrics" }
Hub stores: connections["abc123:dashboard.metrics"]
Cross-tenant broadcast is architecturally impossible:
Tenant A "abc123" → hub["abc123:dashboard.metrics"]
Tenant B "def456" → hub["def456:dashboard.metrics"]
These are different in-memory buckets — no API to cross.
Auth on connection:
Step 1: Client opens WSS connection
Step 2: Client sends { "type": "auth", "token": "<JWT>" }
Step 3: Notify Service validates JWT (ES256 + expiry + revocation check)
Step 4: Connection is registered under hub["{tenantId}:{channel}"]
Step 5: Any message arriving on this connection is tenant-scoped
Invalid JWT → close connection with code 4401 (Unauthorized)
Connection limits per tenant:
NOTIFY_WS_MAX_CONNECTIONS_PER_TENANT = 1000
NOTIFY_WS_RATE_PER_TENANT = 200 msg/s
One tenant cannot exhaust WebSocket capacity for another tenant.
Limits are enforced per tenantId bucket in the hub.
Layer 5 — Feature Flag Key Prefix
Feature flags are stored in a shared PostgreSQL table and evaluated by
GoFeatureFlag. The SDK adds tenantId as a mandatory key prefix,
making flag names globally unique per tenant.
Developer writes: kernel.flags().isEnabled("new-checkout")
SDK transforms to: isEnabled("abc123:new-checkout")
PostgreSQL storage:
key | tenant_id | enabled
-------------|-----------|--------
abc123:new-checkout | abc123 | true
def456:new-checkout | def456 | false ← different tenant, different state
Admin UI filters: WHERE tenant_id = currentTenantId
→ Tenant A admin cannot see Tenant B flags
Optimistic locking for concurrent admin edits:
Every flag has a version: int. PATCH /flags/:key requires
{ version: N } in the request body. Concurrent edits by two admins
result in 409 Conflict for the second writer — no silent overwrite.
Layer 6 — S3 Path Prefix
All files uploaded through the Files Service are stored under a path
that includes tenantId as the first path component:
S3 key structure:
{tenantId}/{bucket}/{fileId}
Examples:
abc123/avatars/usr_01j8m...webp
abc123/documents/doc_01j8n...pdf
def456/avatars/usr_01j9a...webp ← different tenant, different prefix
Gateway enforcement:
1. Module requests: GET /api/v1/files/{fileId}
2. Gateway fetches file metadata from Files Service
3. Files Service: WHERE file_id = ? AND tenant_id = JWT.tenantId
4. If tenant_id mismatch → 404 Not Found (not 403 — prevent oracle)
5. Presigned URLs: generated with tenantId scoped IAM policy
Presigned URL security:
Presigned GET URL → includes X-Amz-Credential scoped to {tenantId} path prefix
Presigned PUT URL → upload allowed ONLY to {tenantId}/staging/{fileId}
Cross-tenant presigned URL: impossible — credential is path-scoped
Isolation Summary Table
| Layer | Technology | Enforced by | Failure mode if layer is bypassed |
|---|---|---|---|
| PostgreSQL RLS | CREATE POLICY + FORCE RLS | PostgreSQL engine | ClickHouse row policy + 4 more layers remain |
| ClickHouse dual | row_policy + Go WHERE | DB engine + application | One sub-layer remains active |
| Kafka SDK filter | EventEnvelope.tenantId check | SDK consumer | Gateway publish validation remains |
| WebSocket namespace | Hub key {tenantId}:{channel} | Notify Service hub | JWT auth on connection remains |
| Feature Flag prefix | Key {tenantId}:{flagName} | SDK + PostgreSQL RLS | Admin UI filter remains |
| S3 path prefix | Path {tenantId}/{bucket}/ | Gateway + Files Service | Files Service DB query remains |
Defence in depth: Each layer is sufficient by itself to prevent cross-tenant access. All six together provide layered redundancy — the SOC 2 Type II and ISO 27001 standard for multi-tenant SaaS.
Tenant Provisioning Isolation
When a new tenant is created, the following isolation primitives are provisioned atomically (single PostgreSQL transaction):
1. INSERT INTO tenants (id, slug, ...)
2. INSERT INTO wallets (id, tenant_id, type='system', ...) ← system wallet
3. INSERT INTO subscriptions (id, tenant_id, status='trialing')
4. CREATE ROW POLICY tenant_filter_{tenantId} ON analytics.* ← ClickHouse
5. CREATE CLICKHOUSE USER tenant_user_{tenantId} ← ClickHouse user
6. GRANT SELECT WITH row_policy TO tenant_user_{tenantId}
7. IAM: create Owner role with wildcard (*) for tenantId
If any step fails → ROLLBACK. No partial tenant state possible.
Related Pages
- Security Deep Dive — encryption layers and key hierarchy
- CDC Pipeline — ClickHouse ReplacingMergeTree and FINAL for deduplication
- Service Map — service-to-service mTLS topology
- Data Flow — JWT propagation through the request lifecycle