Gateway Service — Overview
The API Gateway is the single entry point for all platform traffic. No external client communicates directly with a backend service — every request passes through the Gateway, which enforces authentication, authorization, rate limiting, and OpenAPI schema validation before forwarding to the appropriate upstream service.
Architecture: Two-Layer Gateway
The Gateway is implemented as two cooperating layers:
| Layer | Component | Responsibility |
|---|---|---|
| L7 proxy | Envoy Gateway | Routing, TLS termination, local rate limiting (token-bucket per instance), OIDC |
| Business logic | Go Gateway Service (:8080) | JWT verification, permission resolution, OpenAPI validation, protocol translation (REST → gRPC), circuit breaker for upstream calls |
The two layers are complementary. Envoy handles high-throughput
routing with zero-latency local rate limiting. The Go service adds
business-level logic that Envoy cannot express: per-tenant permission
resolution, OpenAPI schema validation, and circuit breaking for gRPC
calls to core services (sony/gobreaker).
Internal Packages
| Package | Responsibility |
|---|---|
internal/circuitbreaker | Circuit breaker for upstream gRPC services (sony/gobreaker) |
internal/config | Environment-based configuration |
internal/errors | RFC 9457 Problem Details error responses |
internal/health | /live, /ready, /health probe handlers |
internal/middleware | JWT auth, CORS, request logging, request ID generation, real IP extraction |
internal/openapi | OpenAPI 3.x request and response validation |
internal/permission | RBAC permission resolution (Valkey cache + IAM gRPC fallback) |
internal/proxy | Reverse proxy to upstream services |
internal/ratelimit | Dual-layer rate limiting (local token-bucket + Valkey sliding window) |
internal/server | HTTP server lifecycle (graceful shutdown) |
internal/worker | Background workers (cache invalidation) |
Middleware Chain
Every incoming request traverses the middleware chain in strict order:
Incoming request
│
▼
Recoverer ← Catch panics, return 500 (never crash the server)
│
▼
RequestID ← Generate X-Request-ID (ULID) if not present
│
▼
RealIP ← Extract client IP from X-Forwarded-For / X-Real-IP
│
▼
Logging ← Structured request log entry (method, path, IP, requestId)
│
▼
CORS ← Validate Origin, inject Access-Control-* headers
│
▼
JWT Auth ← Verify ES256 signature, extract claims (userId, tenantId, roles)
│
▼
Rate Limiting ← Local token-bucket (Envoy) + Valkey sliding window (global)
│
▼
OpenAPI Validation ← Validate request body against OpenAPI 3.x schema
│
▼
Proxy ← Forward to upstream (REST → gRPC translation if required)
│
▼
Upstream service
Middleware short-circuits on failure: if CORS rejects an origin, the request never reaches JWT Auth. If JWT Auth fails, it never reaches Rate Limiting. Each layer returns an RFC 9457 Problem Details response.
Gateway Responsibilities
| Function | Detail |
|---|---|
| Routing | Routes /api/v1/* to the appropriate upstream service |
| Auth validation | Verifies JWT ES256 signature on every request |
| Permission check | Resolves roles[] → permissions[] via Valkey + IAM gRPC |
| Rate limiting | Dual-layer: Envoy local + Valkey global |
| OpenAPI validation | Validates request body against the upstream service's OpenAPI schema |
| Response validation | Validates response body in dev/staging environments |
| CORS | Configurable per-tenant CORS rules |
| Protocol translation | REST ↔ gRPC where required |
| Request logging | Logs request metadata to the Audit Service (async) |
| Error handling | Standardized RFC 9457 errors across all upstream responses |
gRPC Call Policy
All calls from the Gateway to upstream Go services follow a strict call policy:
| Parameter | Value |
|---|---|
| gRPC deadline | 5 seconds (GRPC_CALL_TIMEOUT_MS) → 504 Gateway Timeout |
| Circuit breaker | 5 consecutive failures in 30s → circuit open for 30s → 503 Service Unavailable (no upstream call) |
| Retry | 1 retry with 100ms backoff for idempotent GET requests. POST/PATCH/DELETE: no retry (idempotency via keys) |
| Partial failure | Batch/stream calls with partial success → 207 Multi-Status (RFC 9457 per element) |
Circuit Breaker Design
The Go Gateway Service uses two-tier circuit breaking via
sony/gobreaker:
| Breaker type | Scope | Purpose |
|---|---|---|
| Per-tenant soft breaker | One circuit per tenant | Isolates a tenant with anomalous error rate (> 50% in 60s). One bad tenant does not open the circuit for all tenants. |
| Per-service global breaker | One circuit per upstream service | Protects against cascading failure when an entire service is down. |
Tenant-Aware Module Routing
Before forwarding a request to a downstream service, the Gateway checks that the module is installed and active for the requesting tenant. The Module Registry state is cached in Valkey:
Valkey key: modules:active:{tenantId} (SET of active module IDs)
GET /api/v1/data/crm/contacts
↓
Gateway: GET modules:active:{tenantId}
→ 'crm' not in SET
→ 404 Not Found (does not reveal module existence to unauthorized tenants)
→ No downstream call made
Health Probes
| Endpoint | Kubernetes probe | Description |
|---|---|---|
GET /live | Liveness | Returns 200 OK if the process is running. Does not check dependencies. |
GET /ready | Readiness | Returns 200 OK only when all required upstream dependencies (IAM, Valkey) are reachable. Returns 503 during startup or after a dependency failure. |
GET /health | Manual check / Admin UI | Full health report: status of each upstream service, Valkey, and the current circuit breaker state for each upstream. |
GET /health → 200 OK
{
"status": "healthy",
"upstreams": {
"iam": { "status": "healthy", "latency_ms": 2 },
"data": { "status": "healthy", "latency_ms": 1 },
"money": { "status": "healthy", "latency_ms": 3 },
"notify": { "status": "healthy", "latency_ms": 1 },
"billing": { "status": "degraded", "latency_ms": 450 }
},
"valkey": { "status": "healthy" },
"circuitBreakers": {
"billing": { "state": "half-open", "failureCount": 3 }
}
}
Envoy xDS Config Reload (Zero Downtime)
Envoy uses xDS (LDS/RDS/CDS/EDS) to reload configuration without restarting. Installing a new module updates the RDS (Route Discovery Service) configuration — new routes become active while in-flight requests complete on the old configuration:
New module installed:
Module Registry → Gateway worker → Envoy xDS RDS update
In-flight requests: complete on old routes
New requests: use new routes immediately
Drain timeout: 10s (ENVOY_DRAIN_TIMEOUT_SEC)
OpenAPI Version Snapshots
When an API version is deprecated, the Gateway stores an OpenAPI
snapshot in S3: api-specs/v{N}/openapi.yaml. Clients still
migrating from old versions can retrieve the deprecated spec:
GET /api-docs/v1 → S3 snapshot (served indefinitely)
GET /api-docs/v2 → current active spec
Error Format
All Gateway errors follow RFC 9457 Problem Details:
{
"type": "https://api.septemcore.com/problems/unauthorized",
"title": "Unauthorized",
"status": 401,
"detail": "JWT signature verification failed.",
"instance": "/api/v1/wallets",
"traceId": "01j9ptr0000000000000000"
}
Content-Type: application/problem+json
Related Pages
- Rate Limiting — dual-layer token-bucket + Valkey sliding window, graceful degradation, 429 + Retry-After
- Authentication — JWT ES256 verification, claims, permission resolution, forwarded headers, B2B2B delegation