Skip to main content

Gateway Service — Overview

The API Gateway is the single entry point for all platform traffic. No external client communicates directly with a backend service — every request passes through the Gateway, which enforces authentication, authorization, rate limiting, and OpenAPI schema validation before forwarding to the appropriate upstream service.


Architecture: Two-Layer Gateway

The Gateway is implemented as two cooperating layers:

LayerComponentResponsibility
L7 proxyEnvoy GatewayRouting, TLS termination, local rate limiting (token-bucket per instance), OIDC
Business logicGo Gateway Service (:8080)JWT verification, permission resolution, OpenAPI validation, protocol translation (REST → gRPC), circuit breaker for upstream calls

The two layers are complementary. Envoy handles high-throughput routing with zero-latency local rate limiting. The Go service adds business-level logic that Envoy cannot express: per-tenant permission resolution, OpenAPI schema validation, and circuit breaking for gRPC calls to core services (sony/gobreaker).


Internal Packages

PackageResponsibility
internal/circuitbreakerCircuit breaker for upstream gRPC services (sony/gobreaker)
internal/configEnvironment-based configuration
internal/errorsRFC 9457 Problem Details error responses
internal/health/live, /ready, /health probe handlers
internal/middlewareJWT auth, CORS, request logging, request ID generation, real IP extraction
internal/openapiOpenAPI 3.x request and response validation
internal/permissionRBAC permission resolution (Valkey cache + IAM gRPC fallback)
internal/proxyReverse proxy to upstream services
internal/ratelimitDual-layer rate limiting (local token-bucket + Valkey sliding window)
internal/serverHTTP server lifecycle (graceful shutdown)
internal/workerBackground workers (cache invalidation)

Middleware Chain

Every incoming request traverses the middleware chain in strict order:

Incoming request


Recoverer ← Catch panics, return 500 (never crash the server)


RequestID ← Generate X-Request-ID (ULID) if not present


RealIP ← Extract client IP from X-Forwarded-For / X-Real-IP


Logging ← Structured request log entry (method, path, IP, requestId)


CORS ← Validate Origin, inject Access-Control-* headers


JWT Auth ← Verify ES256 signature, extract claims (userId, tenantId, roles)


Rate Limiting ← Local token-bucket (Envoy) + Valkey sliding window (global)


OpenAPI Validation ← Validate request body against OpenAPI 3.x schema


Proxy ← Forward to upstream (REST → gRPC translation if required)


Upstream service

Middleware short-circuits on failure: if CORS rejects an origin, the request never reaches JWT Auth. If JWT Auth fails, it never reaches Rate Limiting. Each layer returns an RFC 9457 Problem Details response.


Gateway Responsibilities

FunctionDetail
RoutingRoutes /api/v1/* to the appropriate upstream service
Auth validationVerifies JWT ES256 signature on every request
Permission checkResolves roles[]permissions[] via Valkey + IAM gRPC
Rate limitingDual-layer: Envoy local + Valkey global
OpenAPI validationValidates request body against the upstream service's OpenAPI schema
Response validationValidates response body in dev/staging environments
CORSConfigurable per-tenant CORS rules
Protocol translationREST ↔ gRPC where required
Request loggingLogs request metadata to the Audit Service (async)
Error handlingStandardized RFC 9457 errors across all upstream responses

gRPC Call Policy

All calls from the Gateway to upstream Go services follow a strict call policy:

ParameterValue
gRPC deadline5 seconds (GRPC_CALL_TIMEOUT_MS) → 504 Gateway Timeout
Circuit breaker5 consecutive failures in 30s → circuit open for 30s → 503 Service Unavailable (no upstream call)
Retry1 retry with 100ms backoff for idempotent GET requests. POST/PATCH/DELETE: no retry (idempotency via keys)
Partial failureBatch/stream calls with partial success → 207 Multi-Status (RFC 9457 per element)

Circuit Breaker Design

The Go Gateway Service uses two-tier circuit breaking via sony/gobreaker:

Breaker typeScopePurpose
Per-tenant soft breakerOne circuit per tenantIsolates a tenant with anomalous error rate (> 50% in 60s). One bad tenant does not open the circuit for all tenants.
Per-service global breakerOne circuit per upstream serviceProtects against cascading failure when an entire service is down.

Tenant-Aware Module Routing

Before forwarding a request to a downstream service, the Gateway checks that the module is installed and active for the requesting tenant. The Module Registry state is cached in Valkey:

Valkey key: modules:active:{tenantId} (SET of active module IDs)

GET /api/v1/data/crm/contacts

Gateway: GET modules:active:{tenantId}
→ 'crm' not in SET
→ 404 Not Found (does not reveal module existence to unauthorized tenants)
→ No downstream call made

Health Probes

EndpointKubernetes probeDescription
GET /liveLivenessReturns 200 OK if the process is running. Does not check dependencies.
GET /readyReadinessReturns 200 OK only when all required upstream dependencies (IAM, Valkey) are reachable. Returns 503 during startup or after a dependency failure.
GET /healthManual check / Admin UIFull health report: status of each upstream service, Valkey, and the current circuit breaker state for each upstream.
GET /health → 200 OK
{
"status": "healthy",
"upstreams": {
"iam": { "status": "healthy", "latency_ms": 2 },
"data": { "status": "healthy", "latency_ms": 1 },
"money": { "status": "healthy", "latency_ms": 3 },
"notify": { "status": "healthy", "latency_ms": 1 },
"billing": { "status": "degraded", "latency_ms": 450 }
},
"valkey": { "status": "healthy" },
"circuitBreakers": {
"billing": { "state": "half-open", "failureCount": 3 }
}
}

Envoy xDS Config Reload (Zero Downtime)

Envoy uses xDS (LDS/RDS/CDS/EDS) to reload configuration without restarting. Installing a new module updates the RDS (Route Discovery Service) configuration — new routes become active while in-flight requests complete on the old configuration:

New module installed:
Module Registry → Gateway worker → Envoy xDS RDS update
In-flight requests: complete on old routes
New requests: use new routes immediately
Drain timeout: 10s (ENVOY_DRAIN_TIMEOUT_SEC)

OpenAPI Version Snapshots

When an API version is deprecated, the Gateway stores an OpenAPI snapshot in S3: api-specs/v{N}/openapi.yaml. Clients still migrating from old versions can retrieve the deprecated spec:

GET /api-docs/v1 → S3 snapshot (served indefinitely)
GET /api-docs/v2 → current active spec

Error Format

All Gateway errors follow RFC 9457 Problem Details:

{
"type": "https://api.septemcore.com/problems/unauthorized",
"title": "Unauthorized",
"status": 401,
"detail": "JWT signature verification failed.",
"instance": "/api/v1/wallets",
"traceId": "01j9ptr0000000000000000"
}

Content-Type: application/problem+json


  • Rate Limiting — dual-layer token-bucket + Valkey sliding window, graceful degradation, 429 + Retry-After
  • Authentication — JWT ES256 verification, claims, permission resolution, forwarded headers, B2B2B delegation