Skip to main content

Tenant Isolation

Tenant isolation is the most critical security property of Platform-Kernel. One tenant must never see another tenant's data — regardless of bugs, misconfiguration, or malicious module code.

The platform enforces isolation at six independent layers. A vulnerability in any single layer does not cause a cross-tenant data exposure because the other five layers remain intact.


Isolation Overview


Layer 1 — PostgreSQL Row-Level Security

Every table in Platform-Kernel has a tenant_id UUID NOT NULL column. PostgreSQL Row-Level Security (RLS) is enabled on every table and enforced at the database engine level — not in application code.

-- Applied to every tenant-scoped table
ALTER TABLE {table_name} ENABLE ROW LEVEL SECURITY;
ALTER TABLE {table_name} FORCE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON {table_name}
USING (tenant_id = current_setting('app.tenant_id')::uuid);

How app.tenant_id is set:

Before any query, the Go Data Layer executes:

SET LOCAL app.tenant_id = '{tenantId}';
-- Then executes the actual query within the same transaction

SET LOCAL is scoped to the current transaction. If multiple connections are pooled (pgBouncer), the setting is reset on connection return — cross-leaking is architecturally impossible.

What FORCE ROW LEVEL SECURITY does:

Table owners (the PostgreSQL superuser used for migrations) are normally exempt from RLS. FORCE RLS removes this exemption — even the migration user cannot bypass the policy at runtime.

ProtectionWithout RLSWith RLS
Bug in Go code forgets WHERE tenant_idCross-tenant data returnedDatabase rejects query — returns empty set
SQL injection bypasses WHERE clauseAll tenant data exposedRLS still filters by app.tenant_id
Malicious module calls Data Layer directlyCould access other tenantsImpossible — policy applied at DB engine

Layer 2 — ClickHouse Dual Filter

ClickHouse does not have native Row-Level Security comparable to PostgreSQL. The platform applies two independent filters so that a bug in either one does not cause data exposure.

Application Filter (Go Data Layer)

// All ClickHouse queries go through this builder — tenantId is mandatory
func (r *AnalyticsRepo) Query(ctx context.Context, q AnalyticsQuery) (*Result, error) {
tenantID := auth.TenantIDFromContext(ctx) // extracted from JWT
sql := fmt.Sprintf(
"SELECT %s FROM %s WHERE tenant_id = ? AND %s",
q.Select, q.Table, q.Where,
)
return r.ch.Query(ctx, sql, tenantID, q.Args...)
}

Database-Level Row Policy

-- Created automatically when a new tenant is provisioned
CREATE ROW POLICY tenant_filter_{tenantId}
ON analytics.data_records
FOR SELECT
USING tenant_id = '{tenantId}'
TO tenant_user_{tenantId};

Each tenant has a dedicated ClickHouse user with this policy attached. The Go Data Layer connects as the tenant's user — even a completely correct query cannot return another tenant's rows.

All ClickHouse queries are logged with tenantId and traceId for forensic analysis (SOC 2 compliance).


Layer 3 — Kafka Consumer SDK Filter

Kafka topics are shared across all tenants (one topic per domain). A naive consumer would receive every tenant's events. The SDK enforces tenant isolation at the subscription level.

Publish-side isolation:

When a module publishes an event, the Gateway injects tenantId from the JWT — the module cannot set or override it:

Module: kernel.events().publish("order.created", { ... })

SDK: EventEnvelope { tenantId: JWT.tenantId (injected), payload: { ... } }

Gateway: Validates tenantId matches JWT — rejects if mismatch

RBAC on events:

  • A module can only subscribe to events declared in manifest.events.subscribes[].
  • A module can only publish events declared in manifest.events.publishes[].
  • Kernel-owned events (auth.*, money.*, billing.*) are on a hardcoded whitelist — only core services can publish them.

Layer 4 — WebSocket Channel Namespace

WebSocket connections are managed by the Notify Service. The hub uses a compound key that includes tenantId as a mandatory prefix.

Internal hub key: {tenantId}:{channel}

Example:
Module calls: kernel.notify().broadcast("dashboard.metrics", data)
SDK sends: subscribe { channel: "dashboard.metrics" }
Hub stores: connections["abc123:dashboard.metrics"]

Cross-tenant broadcast is architecturally impossible:
Tenant A "abc123" → hub["abc123:dashboard.metrics"]
Tenant B "def456" → hub["def456:dashboard.metrics"]
These are different in-memory buckets — no API to cross.

Auth on connection:

Step 1: Client opens WSS connection
Step 2: Client sends { "type": "auth", "token": "<JWT>" }
Step 3: Notify Service validates JWT (ES256 + expiry + revocation check)
Step 4: Connection is registered under hub["{tenantId}:{channel}"]
Step 5: Any message arriving on this connection is tenant-scoped

Invalid JWT → close connection with code 4401 (Unauthorized)

Connection limits per tenant:

NOTIFY_WS_MAX_CONNECTIONS_PER_TENANT = 1000
NOTIFY_WS_RATE_PER_TENANT = 200 msg/s

One tenant cannot exhaust WebSocket capacity for another tenant.
Limits are enforced per tenantId bucket in the hub.

Layer 5 — Feature Flag Key Prefix

Feature flags are stored in a shared PostgreSQL table and evaluated by GoFeatureFlag. The SDK adds tenantId as a mandatory key prefix, making flag names globally unique per tenant.

Developer writes: kernel.flags().isEnabled("new-checkout")
SDK transforms to: isEnabled("abc123:new-checkout")

PostgreSQL storage:
key | tenant_id | enabled
-------------|-----------|--------
abc123:new-checkout | abc123 | true
def456:new-checkout | def456 | false ← different tenant, different state

Admin UI filters: WHERE tenant_id = currentTenantId
→ Tenant A admin cannot see Tenant B flags

Optimistic locking for concurrent admin edits:

Every flag has a version: int. PATCH /flags/:key requires { version: N } in the request body. Concurrent edits by two admins result in 409 Conflict for the second writer — no silent overwrite.


Layer 6 — S3 Path Prefix

All files uploaded through the Files Service are stored under a path that includes tenantId as the first path component:

S3 key structure:
{tenantId}/{bucket}/{fileId}

Examples:
abc123/avatars/usr_01j8m...webp
abc123/documents/doc_01j8n...pdf
def456/avatars/usr_01j9a...webp ← different tenant, different prefix

Gateway enforcement:
1. Module requests: GET /api/v1/files/{fileId}
2. Gateway fetches file metadata from Files Service
3. Files Service: WHERE file_id = ? AND tenant_id = JWT.tenantId
4. If tenant_id mismatch → 404 Not Found (not 403 — prevent oracle)
5. Presigned URLs: generated with tenantId scoped IAM policy

Presigned URL security:

Presigned GET URL → includes X-Amz-Credential scoped to {tenantId} path prefix
Presigned PUT URL → upload allowed ONLY to {tenantId}/staging/{fileId}

Cross-tenant presigned URL: impossible — credential is path-scoped

Isolation Summary Table

LayerTechnologyEnforced byFailure mode if layer is bypassed
PostgreSQL RLSCREATE POLICY + FORCE RLSPostgreSQL engineClickHouse row policy + 4 more layers remain
ClickHouse dualrow_policy + Go WHEREDB engine + applicationOne sub-layer remains active
Kafka SDK filterEventEnvelope.tenantId checkSDK consumerGateway publish validation remains
WebSocket namespaceHub key {tenantId}:{channel}Notify Service hubJWT auth on connection remains
Feature Flag prefixKey {tenantId}:{flagName}SDK + PostgreSQL RLSAdmin UI filter remains
S3 path prefixPath {tenantId}/{bucket}/Gateway + Files ServiceFiles Service DB query remains

Defence in depth: Each layer is sufficient by itself to prevent cross-tenant access. All six together provide layered redundancy — the SOC 2 Type II and ISO 27001 standard for multi-tenant SaaS.


Tenant Provisioning Isolation

When a new tenant is created, the following isolation primitives are provisioned atomically (single PostgreSQL transaction):

1. INSERT INTO tenants (id, slug, ...)
2. INSERT INTO wallets (id, tenant_id, type='system', ...) ← system wallet
3. INSERT INTO subscriptions (id, tenant_id, status='trialing')
4. CREATE ROW POLICY tenant_filter_{tenantId} ON analytics.* ← ClickHouse
5. CREATE CLICKHOUSE USER tenant_user_{tenantId} ← ClickHouse user
6. GRANT SELECT WITH row_policy TO tenant_user_{tenantId}
7. IAM: create Owner role with wildcard (*) for tenantId

If any step fails → ROLLBACK. No partial tenant state possible.