Skip to main content

Gateway REST API Reference

The API Gateway is the single entry point for all platform traffic. It is a two-tier system:

  • Envoy Gateway — L7 proxy (routing, TLS termination, OIDC, Envoy-native rate limiting). Uses xDS (LDS/RDS/CDS/EDS) for zero-downtime config reload — no restarts required when modules are installed.
  • Go Gateway Service — business logic layer (JWT validation, permission resolution, OpenAPI request/response validation, protocol translation REST ↔ gRPC, two-tier circuit breaker via sony/gobreaker).

The Gateway does not expose domain endpoints (those belong to each service). It exposes three operational endpoints used by Kubernetes probes and the Admin UI health dashboard.

See the REST API Overview for authentication, error format, pagination, and rate limiting.


Endpoints

EndpointAuthDescription
GET /liveKubernetes liveness probe — is the process alive?
GET /readyKubernetes readiness probe — can the pod accept traffic?
GET /health✅ (admin)Full component health report

/live and /ready require no authentication — Kubernetes kubelet calls them directly. /health requires a valid JWT with the admin role because it includes internal dependency states.


GET /live — Liveness Probe

GET https://api.septemcore.com/live

Response 200 OK (process is alive):

{ "status": "ok" }

Response 503 Service Unavailable (process is in an unrecoverable state):

{ "status": "error", "reason": "goroutine panic, restarting" }

Kubernetes restarts the pod when /live returns non-2xx. A failing liveness probe indicates a bug or deadlock — not a dependency outage (that is /ready's concern).


GET /ready — Readiness Probe

GET https://api.septemcore.com/ready

Response 200 OK (pod can accept traffic):

{ "status": "ready" }

Response 503 Service Unavailable (pod should not receive traffic):

{
"status": "not_ready",
"reason": "flag cache not warmed"
}

Readiness Conditions

A pod is not ready (returns 503) when any of the following are true:

ConditionDescription
Flag cache emptykernel.flags() cache not yet loaded from Valkey/GoFeatureFlag
gRPC connection pool not establishedGo Gateway Service has not connected to core services
Envoy xDS sync incompleteEnvoy has not received the latest route config

Kubernetes stops sending traffic to a pod that fails /ready. Existing pods with warm caches continue handling requests during rolling deploys — zero-downtime deployment guarantee.


GET /health — Full Component Health Report

Returns the health status of every platform component. Requires admin role.

GET https://api.septemcore.com/health
Authorization: Bearer <access_token>

Response 200 OK (all components healthy):

{
"status": "healthy",
"version": "1.14.3",
"uptime": "72h14m32s",
"components": {
"postgresql": { "status": "healthy", "latencyMs": 2 },
"clickhouse": { "status": "healthy", "latencyMs": 5 },
"kafka": { "status": "healthy", "latencyMs": 8 },
"rabbitmq": { "status": "healthy", "latencyMs": 3 },
"valkey": { "status": "healthy", "latencyMs": 1 },
"s3": { "status": "healthy", "latencyMs": 12 },
"gofeatureflag": { "status": "healthy", "latencyMs": 4 },
"envoy": { "status": "healthy", "latencyMs": 0 }
}
}

Response 207 Multi-Status (one or more components degraded):

{
"status": "degraded",
"version": "1.14.3",
"uptime": "72h14m32s",
"components": {
"postgresql": { "status": "healthy", "latencyMs": 2 },
"clickhouse": { "status": "degraded", "latencyMs": 4200, "error": "response time > 2000ms" },
"kafka": { "status": "healthy", "latencyMs": 8 },
"rabbitmq": { "status": "unhealthy","latencyMs": null, "error": "connection refused" },
"valkey": { "status": "healthy", "latencyMs": 1 },
"s3": { "status": "healthy", "latencyMs": 12 },
"gofeatureflag": { "status": "healthy","latencyMs": 4 },
"envoy": { "status": "healthy", "latencyMs": 0 }
}
}

ComponentStatus Schema

FieldTypeValuesDescription
statusstringhealthy, degraded, unhealthyComponent health state
latencyMsinteger | null0 – ∞ | nullRound-trip latency of the health ping. null when unreachable
errorstring | nullAnyHuman-readable error message when not healthy

Top-Level Status Rules

Top-level statusConditionHTTP status
healthyAll components healthy200 OK
degradedAt least one degraded, none unhealthy207 Multi-Status
unhealthyAt least one unhealthy503 Service Unavailable

Gateway Responsibilities

FunctionDetail
RoutingRoutes requests to the appropriate core service
JWT validationValidates and decodes Bearer token on every request
Rate limitingToken-bucket (hybrid: Envoy local + Valkey global)
OpenAPI validationValidates request body and response body against OpenAPI 3.x schemas
CORSConfigurable per-tenant CORS rules
Protocol translationREST → gRPC when the downstream service is Go gRPC
Request loggingWrites request metadata to Audit Service (audit.api.called)
Tenant-aware routingChecks module availability for tenant before forwarding (modules:active:{tenantId} Valkey cache). Unknown module → 404 Not Found

gRPC Call Policy

ParameterValue
gRPC deadline5 seconds (GRPC_CALL_TIMEOUT_MS). Timeout → 504 Gateway Timeout
Circuit breaker5 consecutive failures in 30 s → circuit open for 30 s → 503 Service Unavailable without attempting downstream call (sony/gobreaker)
Per-tenant soft breakerError rate > 50% over 60 s → isolates that tenant. One tenant's failures cannot open the circuit for all
Retry1 retry with 100 ms backoff for idempotent GET requests. No retry for POST/PATCH/DELETE
Partial failure (batch/stream)207 Multi-Status with per-item RFC 9457 error detail

Rate Limiting

LimitDefaultConfiguration
Default per authenticated user1000 req / minEnvoy RateLimit filter
Admin role override5000 req / minRole-based override
Per-tenant global ceilingConfigurable per Billing planBilling → Gateway sync
Hot-key protection (> 10K RPS)Envoy local token-bucket intercepts burst before Valkey lookup

429 Response Format

HTTP/1.1 429 Too Many Requests
Retry-After: 42
Content-Type: application/problem+json

{
"type": "https://septemcore.com/problems/rate-limit-exceeded",
"title": "Too Many Requests",
"status": 429,
"detail": "Rate limit of 1000 requests/min exceeded. Retry after 42 seconds.",
"instance": "/api/v1/data/crm/contacts",
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736"
}

The Retry-After header is always present on 429 responses. The message is not queued at the Gateway level (unlike the Notify Service, which queues 429s). The caller must implement retry logic.


OpenAPI Version Snapshots

When an API version is deprecated, the Gateway saves an immutable snapshot of the OpenAPI specification to S3:

s3://api-specs/v{N}/openapi.yaml

This snapshot is served at GET /api-docs/v{N} indefinitely — clients migrating from a deprecated version can always access the old specification.


Error Reference

Error typeStatusTrigger
problems/rate-limit-exceeded429Per-user or per-tenant rate limit hit
problems/unauthorized401Missing or expired JWT
problems/forbidden403Valid JWT but insufficient permission
problems/module-not-found404Module not installed for this tenant
problems/gateway-timeout504gRPC deadline (5 s) exceeded
problems/service-unavailable503Circuit breaker open for this service
problems/validation-failed400Request body failed OpenAPI schema validation