Authentication & Authorization
The Gateway enforces authentication and authorization on every request.
No request reaches an upstream service without a verified JWT and
a successful permission check. These two checks are handled by the
internal/middleware and internal/permission packages inside the
Go Gateway Service.
JWT Verification
The Gateway verifies the JWT using ES256 (ECDSA P-256) — the IAM Service signs all tokens with the platform's private key. The Gateway holds only the corresponding public key (read-only).
Incoming request:
Authorization: Bearer eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9...
Gateway:
1. Extract token from Authorization header
2. Verify ES256 signature with platform public key
3. Check exp claim (tokens are valid 15 minutes)
4. Extract claims → userId, tenantId, roles[]
5. Pass claims to permission resolution
JWT Claims
| Claim | Type | Description |
|---|---|---|
sub | string (ULID) | User ID — forwarded as X-User-ID header to upstream |
email | string | User email address |
tenantId | string (ULID) | Tenant ID — forwarded as X-Tenant-ID header to upstream |
roles | string[] | Role names assigned to the user |
iat | number | Issued at (Unix timestamp) |
exp | number | Expiry (Unix timestamp — issued at + 15 minutes) |
iss | string | Issuer (OIDC issuer URL of the platform) |
aud | string | Audience (platform-kernel for internal requests; client_id for OIDC app tokens) |
custom claims | object | Namespace-prefixed custom data from modules |
Permissions are NOT stored in the JWT. Storing permissions in the token creates tokens of 5–10 KB at scale (hundreds of atomic permissions from dozens of modules), breaks cookie storage (4 KB limit), and increases bandwidth on every request. Only
roles[]are stored. Permissions are resolved server-side on each request.Maximum JWT size: 8 KB (NGINX header limit). Custom claims must respect this. PII (full phone numbers, IPs) in custom claims is forbidden — JWT is base64-encoded and readable without a key.
Permission Resolution
Roles stored in the JWT are resolved to permissions server-side on every request. The resolution uses a two-level cache to minimize latency:
Request arrives with roles: ['admin', 'billing-viewer']
Level 1: Check Valkey permission_version:{tenantId}
→ Compare with cached version
→ Version match → Cache HIT
→ Use cached permissions[] (Valkey key: permissions:{tenantId}:{roles_hash}, TTL 5 min)
→ Version mismatch or MISS
→ gRPC call to IAM: resolve roles → permissions[]
→ Write to Valkey with current version + TTL 5 min
Level 2 (fallback — IAM + Valkey both down):
→ In-process LRU cache (GATEWAY_LOCAL_PERMISSION_CACHE_TTL_SEC=60)
→ Serve stale permissions + inject X-Permissions-Stale: true header
→ Cold start + IAM down → 503 Service Unavailable (fail-closed)
| Cache parameter | Value |
|---|---|
| Valkey key (permissions) | permissions:{tenantId}:{sorted_roles_hash} |
| Valkey TTL | 5 minutes |
| Local in-process cache TTL | 60 seconds (GATEWAY_LOCAL_PERMISSION_CACHE_TTL_SEC) |
| Platform Owner wildcard | Role * → permission check bypassed (no Valkey lookup) |
Permission Check Result
✅ Permission found → forward request to upstream service
❌ Permission missing → 403 Forbidden (RFC 9457)
{
"type": "https://api.septemcore.com/problems/forbidden",
"title": "Forbidden",
"status": 403,
"detail": "Required permission 'billing.plan.change' not found for roles ['billing-viewer'].",
"traceId": "01j9ptr0000000000000002"
}
Cache Invalidation: Dual Channel
When a role or permission changes in the IAM Service, the permission cache must be invalidated immediately. The Gateway uses two independent channels to guarantee invalidation even during outages:
| Channel | Mechanism | Dependency |
|---|---|---|
| Primary (async) | Event Bus: auth.role.changed → Gateway invalidates Valkey cache | Kafka must be available |
| Fallback (sync) | Version counter: IAM increments INCR permission_version:{tenantId} in Valkey on every role change. Gateway compares version on every request (~0.1 ms). Mismatch → force-reload. | Kafka-independent |
The version counter adds one Valkey GET (~0.1 ms) per request. This cost is justified: it ensures that even a complete Kafka outage does not delay permission revocation.
For bulk role assignments, IAM publishes a single auth.roles.bulk_changed
event. The Gateway debounces these events in a 50 ms window to
prevent thundering herd (1 000 invalidations → 1 batch reload).
Forwarded Headers
After JWT verification, the Gateway injects claims into upstream request headers. Upstream services must read these headers — they must not re-verify the JWT:
| Header | Source | Example |
|---|---|---|
X-User-ID | sub claim | 01j9pa5mz700000000000000 |
X-Tenant-ID | tenantId claim | 01j9ten0000000000000000 |
X-Request-ID | Generated by RequestID middleware (ULID) | 01j9ptr0000000000000003 |
X-Permissions-Stale | Injected when serving from stale local cache | true |
Upstream services trust these headers unconditionally — the Gateway is the single verification point. A service that re-verifies the JWT is a design anti-pattern in this architecture.
JWT Refresh Token Race Protection
When a user has multiple browser tabs open and the access token
expires, all tabs simultaneously call POST /auth/refresh with the
same refresh token. Without protection, only the first succeeds and
the rest receive 401.
The platform protects against this with a grace window on the refresh token:
| Parameter | Value |
|---|---|
| Refresh grace window | 10 seconds after first use (AUTH_REFRESH_GRACE_WINDOW_SEC=10) |
| Behavior in window | Same refresh token → same new token pair (idempotent) |
| After window | Refresh token invalidated — next use returns 401 |
| SDK coordination | BroadcastChannel API: one in-flight refresh per origin, result broadcast to all tabs |
B2B2B Delegation Middleware
The platform supports a three-level tenant hierarchy: Platform Owner → Partner → Client.
A Partner-level user acting on behalf of a Client tenant must be authenticated in their own tenant but authorized to access the Client tenant's resources. The delegation middleware validates this:
Partner user (tenantId: partner-01j...) makes request to Client tenant:
Header: X-Delegate-To-Tenant: client-01j...
JWT: tenantId=partner-01j..., roles=['partner-admin']
Gateway delegation middleware:
1. Detect X-Delegate-To-Tenant header
2. gRPC: TenantHierarchyService.IsDescendant(
ancestor: partner-01j...,
descendant: client-01j...
)
┌─ Not a descendant → 403 Forbidden (partner cannot access this client)
└─ Is a descendant:
Forward X-Tenant-ID: client-01j... ← effective tenant
Forward X-User-ID: partner-user-id ← original user
Forward X-Delegated-By: partner-01j... ← audit trail
The IsDescendant check uses a PostgreSQL closure table with O(1)
ancestry lookups — no recursive queries regardless of hierarchy depth.
Three-Level Security Enforcement
Level 1: UI Shell → Hides UI elements (UX, not security)
Level 2: API Gateway → hasPermission() on every request (BLOCKING)
Level 3: PostgreSQL → Row-Level Security by tenant_id (DATA ISOLATION)
Level 2 is mandatory even when Level 1 hides the UI element. A user who manually constructs a URL hits Level 2. A compromised service that bypasses Level 2 still hits Level 3.
Anonymous Requests
Requests without an Authorization header are processed as
anonymous. Anonymous requests:
- Are subject to anonymous rate limits (100 req/min)
- Can only access publicly declared endpoints (e.g.
POST /auth/login,POST /auth/register) - Receive
401 Unauthorizedon any protected endpoint
{
"type": "https://api.septemcore.com/problems/unauthorized",
"title": "Unauthorized",
"status": 401,
"detail": "Authorization header is missing or malformed.",
"traceId": "01j9ptr0000000000000004"
}