Skip to main content

Authentication & Authorization

The Gateway enforces authentication and authorization on every request. No request reaches an upstream service without a verified JWT and a successful permission check. These two checks are handled by the internal/middleware and internal/permission packages inside the Go Gateway Service.


JWT Verification

The Gateway verifies the JWT using ES256 (ECDSA P-256) — the IAM Service signs all tokens with the platform's private key. The Gateway holds only the corresponding public key (read-only).

Incoming request:
Authorization: Bearer eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9...

Gateway:
1. Extract token from Authorization header
2. Verify ES256 signature with platform public key
3. Check exp claim (tokens are valid 15 minutes)
4. Extract claims → userId, tenantId, roles[]
5. Pass claims to permission resolution

JWT Claims

ClaimTypeDescription
substring (ULID)User ID — forwarded as X-User-ID header to upstream
emailstringUser email address
tenantIdstring (ULID)Tenant ID — forwarded as X-Tenant-ID header to upstream
rolesstring[]Role names assigned to the user
iatnumberIssued at (Unix timestamp)
expnumberExpiry (Unix timestamp — issued at + 15 minutes)
issstringIssuer (OIDC issuer URL of the platform)
audstringAudience (platform-kernel for internal requests; client_id for OIDC app tokens)
custom claimsobjectNamespace-prefixed custom data from modules

Permissions are NOT stored in the JWT. Storing permissions in the token creates tokens of 5–10 KB at scale (hundreds of atomic permissions from dozens of modules), breaks cookie storage (4 KB limit), and increases bandwidth on every request. Only roles[] are stored. Permissions are resolved server-side on each request.

Maximum JWT size: 8 KB (NGINX header limit). Custom claims must respect this. PII (full phone numbers, IPs) in custom claims is forbidden — JWT is base64-encoded and readable without a key.


Permission Resolution

Roles stored in the JWT are resolved to permissions server-side on every request. The resolution uses a two-level cache to minimize latency:

Request arrives with roles: ['admin', 'billing-viewer']

Level 1: Check Valkey permission_version:{tenantId}
→ Compare with cached version
→ Version match → Cache HIT
→ Use cached permissions[] (Valkey key: permissions:{tenantId}:{roles_hash}, TTL 5 min)
→ Version mismatch or MISS
→ gRPC call to IAM: resolve roles → permissions[]
→ Write to Valkey with current version + TTL 5 min

Level 2 (fallback — IAM + Valkey both down):
→ In-process LRU cache (GATEWAY_LOCAL_PERMISSION_CACHE_TTL_SEC=60)
→ Serve stale permissions + inject X-Permissions-Stale: true header
→ Cold start + IAM down → 503 Service Unavailable (fail-closed)
Cache parameterValue
Valkey key (permissions)permissions:{tenantId}:{sorted_roles_hash}
Valkey TTL5 minutes
Local in-process cache TTL60 seconds (GATEWAY_LOCAL_PERMISSION_CACHE_TTL_SEC)
Platform Owner wildcardRole * → permission check bypassed (no Valkey lookup)

Permission Check Result

✅ Permission found → forward request to upstream service
❌ Permission missing → 403 Forbidden (RFC 9457)
{
"type": "https://api.septemcore.com/problems/forbidden",
"title": "Forbidden",
"status": 403,
"detail": "Required permission 'billing.plan.change' not found for roles ['billing-viewer'].",
"traceId": "01j9ptr0000000000000002"
}

Cache Invalidation: Dual Channel

When a role or permission changes in the IAM Service, the permission cache must be invalidated immediately. The Gateway uses two independent channels to guarantee invalidation even during outages:

ChannelMechanismDependency
Primary (async)Event Bus: auth.role.changed → Gateway invalidates Valkey cacheKafka must be available
Fallback (sync)Version counter: IAM increments INCR permission_version:{tenantId} in Valkey on every role change. Gateway compares version on every request (~0.1 ms). Mismatch → force-reload.Kafka-independent

The version counter adds one Valkey GET (~0.1 ms) per request. This cost is justified: it ensures that even a complete Kafka outage does not delay permission revocation.

For bulk role assignments, IAM publishes a single auth.roles.bulk_changed event. The Gateway debounces these events in a 50 ms window to prevent thundering herd (1 000 invalidations → 1 batch reload).


Forwarded Headers

After JWT verification, the Gateway injects claims into upstream request headers. Upstream services must read these headers — they must not re-verify the JWT:

HeaderSourceExample
X-User-IDsub claim01j9pa5mz700000000000000
X-Tenant-IDtenantId claim01j9ten0000000000000000
X-Request-IDGenerated by RequestID middleware (ULID)01j9ptr0000000000000003
X-Permissions-StaleInjected when serving from stale local cachetrue

Upstream services trust these headers unconditionally — the Gateway is the single verification point. A service that re-verifies the JWT is a design anti-pattern in this architecture.


JWT Refresh Token Race Protection

When a user has multiple browser tabs open and the access token expires, all tabs simultaneously call POST /auth/refresh with the same refresh token. Without protection, only the first succeeds and the rest receive 401.

The platform protects against this with a grace window on the refresh token:

ParameterValue
Refresh grace window10 seconds after first use (AUTH_REFRESH_GRACE_WINDOW_SEC=10)
Behavior in windowSame refresh token → same new token pair (idempotent)
After windowRefresh token invalidated — next use returns 401
SDK coordinationBroadcastChannel API: one in-flight refresh per origin, result broadcast to all tabs

B2B2B Delegation Middleware

The platform supports a three-level tenant hierarchy: Platform Owner → Partner → Client.

A Partner-level user acting on behalf of a Client tenant must be authenticated in their own tenant but authorized to access the Client tenant's resources. The delegation middleware validates this:

Partner user (tenantId: partner-01j...) makes request to Client tenant:
Header: X-Delegate-To-Tenant: client-01j...
JWT: tenantId=partner-01j..., roles=['partner-admin']

Gateway delegation middleware:
1. Detect X-Delegate-To-Tenant header
2. gRPC: TenantHierarchyService.IsDescendant(
ancestor: partner-01j...,
descendant: client-01j...
)
┌─ Not a descendant → 403 Forbidden (partner cannot access this client)
└─ Is a descendant:
Forward X-Tenant-ID: client-01j... ← effective tenant
Forward X-User-ID: partner-user-id ← original user
Forward X-Delegated-By: partner-01j... ← audit trail

The IsDescendant check uses a PostgreSQL closure table with O(1) ancestry lookups — no recursive queries regardless of hierarchy depth.

Three-Level Security Enforcement

Level 1: UI Shell → Hides UI elements (UX, not security)
Level 2: API Gateway → hasPermission() on every request (BLOCKING)
Level 3: PostgreSQL → Row-Level Security by tenant_id (DATA ISOLATION)

Level 2 is mandatory even when Level 1 hides the UI element. A user who manually constructs a URL hits Level 2. A compromised service that bypasses Level 2 still hits Level 3.


Anonymous Requests

Requests without an Authorization header are processed as anonymous. Anonymous requests:

  • Are subject to anonymous rate limits (100 req/min)
  • Can only access publicly declared endpoints (e.g. POST /auth/login, POST /auth/register)
  • Receive 401 Unauthorized on any protected endpoint
{
"type": "https://api.septemcore.com/problems/unauthorized",
"title": "Unauthorized",
"status": 401,
"detail": "Authorization header is missing or malformed.",
"traceId": "01j9ptr0000000000000004"
}