Gateway REST API Reference
The API Gateway is the single entry point for all platform traffic. It is a two-tier system:
- Envoy Gateway — L7 proxy (routing, TLS termination, OIDC, Envoy-native rate limiting). Uses xDS (LDS/RDS/CDS/EDS) for zero-downtime config reload — no restarts required when modules are installed.
- Go Gateway Service — business logic layer (JWT validation, permission
resolution, OpenAPI request/response validation, protocol translation
REST ↔ gRPC, two-tier circuit breaker via
sony/gobreaker).
The Gateway does not expose domain endpoints (those belong to each service). It exposes three operational endpoints used by Kubernetes probes and the Admin UI health dashboard.
See the REST API Overview for authentication, error format, pagination, and rate limiting.
Endpoints
| Endpoint | Auth | Description |
|---|---|---|
GET /live | ❌ | Kubernetes liveness probe — is the process alive? |
GET /ready | ❌ | Kubernetes readiness probe — can the pod accept traffic? |
GET /health | ✅ (admin) | Full component health report |
/liveand/readyrequire no authentication — Kubernetes kubelet calls them directly./healthrequires a valid JWT with theadminrole because it includes internal dependency states.
GET /live — Liveness Probe
GET https://api.septemcore.com/live
Response 200 OK (process is alive):
{ "status": "ok" }
Response 503 Service Unavailable (process is in an unrecoverable state):
{ "status": "error", "reason": "goroutine panic, restarting" }
Kubernetes restarts the pod when
/livereturns non-2xx. A failing liveness probe indicates a bug or deadlock — not a dependency outage (that is/ready's concern).
GET /ready — Readiness Probe
GET https://api.septemcore.com/ready
Response 200 OK (pod can accept traffic):
{ "status": "ready" }
Response 503 Service Unavailable (pod should not receive traffic):
{
"status": "not_ready",
"reason": "flag cache not warmed"
}
Readiness Conditions
A pod is not ready (returns 503) when any of the following are true:
| Condition | Description |
|---|---|
| Flag cache empty | kernel.flags() cache not yet loaded from Valkey/GoFeatureFlag |
| gRPC connection pool not established | Go Gateway Service has not connected to core services |
| Envoy xDS sync incomplete | Envoy has not received the latest route config |
Kubernetes stops sending traffic to a pod that fails
/ready. Existing pods with warm caches continue handling requests during rolling deploys — zero-downtime deployment guarantee.
GET /health — Full Component Health Report
Returns the health status of every platform component. Requires admin role.
GET https://api.septemcore.com/health
Authorization: Bearer <access_token>
Response 200 OK (all components healthy):
{
"status": "healthy",
"version": "1.14.3",
"uptime": "72h14m32s",
"components": {
"postgresql": { "status": "healthy", "latencyMs": 2 },
"clickhouse": { "status": "healthy", "latencyMs": 5 },
"kafka": { "status": "healthy", "latencyMs": 8 },
"rabbitmq": { "status": "healthy", "latencyMs": 3 },
"valkey": { "status": "healthy", "latencyMs": 1 },
"s3": { "status": "healthy", "latencyMs": 12 },
"gofeatureflag": { "status": "healthy", "latencyMs": 4 },
"envoy": { "status": "healthy", "latencyMs": 0 }
}
}
Response 207 Multi-Status (one or more components degraded):
{
"status": "degraded",
"version": "1.14.3",
"uptime": "72h14m32s",
"components": {
"postgresql": { "status": "healthy", "latencyMs": 2 },
"clickhouse": { "status": "degraded", "latencyMs": 4200, "error": "response time > 2000ms" },
"kafka": { "status": "healthy", "latencyMs": 8 },
"rabbitmq": { "status": "unhealthy","latencyMs": null, "error": "connection refused" },
"valkey": { "status": "healthy", "latencyMs": 1 },
"s3": { "status": "healthy", "latencyMs": 12 },
"gofeatureflag": { "status": "healthy","latencyMs": 4 },
"envoy": { "status": "healthy", "latencyMs": 0 }
}
}
ComponentStatus Schema
| Field | Type | Values | Description |
|---|---|---|---|
status | string | healthy, degraded, unhealthy | Component health state |
latencyMs | integer | null | 0 – ∞ | null | Round-trip latency of the health ping. null when unreachable |
error | string | null | Any | Human-readable error message when not healthy |
Top-Level Status Rules
Top-level status | Condition | HTTP status |
|---|---|---|
healthy | All components healthy | 200 OK |
degraded | At least one degraded, none unhealthy | 207 Multi-Status |
unhealthy | At least one unhealthy | 503 Service Unavailable |
Gateway Responsibilities
| Function | Detail |
|---|---|
| Routing | Routes requests to the appropriate core service |
| JWT validation | Validates and decodes Bearer token on every request |
| Rate limiting | Token-bucket (hybrid: Envoy local + Valkey global) |
| OpenAPI validation | Validates request body and response body against OpenAPI 3.x schemas |
| CORS | Configurable per-tenant CORS rules |
| Protocol translation | REST → gRPC when the downstream service is Go gRPC |
| Request logging | Writes request metadata to Audit Service (audit.api.called) |
| Tenant-aware routing | Checks module availability for tenant before forwarding (modules:active:{tenantId} Valkey cache). Unknown module → 404 Not Found |
gRPC Call Policy
| Parameter | Value |
|---|---|
| gRPC deadline | 5 seconds (GRPC_CALL_TIMEOUT_MS). Timeout → 504 Gateway Timeout |
| Circuit breaker | 5 consecutive failures in 30 s → circuit open for 30 s → 503 Service Unavailable without attempting downstream call (sony/gobreaker) |
| Per-tenant soft breaker | Error rate > 50% over 60 s → isolates that tenant. One tenant's failures cannot open the circuit for all |
| Retry | 1 retry with 100 ms backoff for idempotent GET requests. No retry for POST/PATCH/DELETE |
| Partial failure (batch/stream) | 207 Multi-Status with per-item RFC 9457 error detail |
Rate Limiting
| Limit | Default | Configuration |
|---|---|---|
| Default per authenticated user | 1000 req / min | Envoy RateLimit filter |
| Admin role override | 5000 req / min | Role-based override |
| Per-tenant global ceiling | Configurable per Billing plan | Billing → Gateway sync |
| Hot-key protection (> 10K RPS) | Envoy local token-bucket intercepts burst before Valkey lookup | — |
429 Response Format
HTTP/1.1 429 Too Many Requests
Retry-After: 42
Content-Type: application/problem+json
{
"type": "https://septemcore.com/problems/rate-limit-exceeded",
"title": "Too Many Requests",
"status": 429,
"detail": "Rate limit of 1000 requests/min exceeded. Retry after 42 seconds.",
"instance": "/api/v1/data/crm/contacts",
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736"
}
The
Retry-Afterheader is always present on429responses. The message is not queued at the Gateway level (unlike the Notify Service, which queues429s). The caller must implement retry logic.
OpenAPI Version Snapshots
When an API version is deprecated, the Gateway saves an immutable snapshot of the OpenAPI specification to S3:
s3://api-specs/v{N}/openapi.yaml
This snapshot is served at GET /api-docs/v{N} indefinitely — clients
migrating from a deprecated version can always access the old specification.
Error Reference
| Error type | Status | Trigger |
|---|---|---|
problems/rate-limit-exceeded | 429 | Per-user or per-tenant rate limit hit |
problems/unauthorized | 401 | Missing or expired JWT |
problems/forbidden | 403 | Valid JWT but insufficient permission |
problems/module-not-found | 404 | Module not installed for this tenant |
problems/gateway-timeout | 504 | gRPC deadline (5 s) exceeded |
problems/service-unavailable | 503 | Circuit breaker open for this service |
problems/validation-failed | 400 | Request body failed OpenAPI schema validation |