Gateway REST API Reference

The API Gateway is the single entry point for all platform traffic. It is a two-tier system:

Envoy Gateway — L7 proxy (routing, TLS termination, OIDC, Envoy-native rate limiting). Uses xDS (LDS/RDS/CDS/EDS) for zero-downtime config reload — no restarts required when modules are installed.
Go Gateway Service — business logic layer (JWT validation, permission resolution, OpenAPI request/response validation, protocol translation REST ↔ gRPC, two-tier circuit breaker via sony/gobreaker).

The Gateway does not expose domain endpoints (those belong to each service). It exposes three operational endpoints used by Kubernetes probes and the Admin UI health dashboard.

See the REST API Overview for authentication, error format, pagination, and rate limiting.

Endpoints

Endpoint	Auth	Description
`GET /live`	❌	Kubernetes liveness probe — is the process alive?
`GET /ready`	❌	Kubernetes readiness probe — can the pod accept traffic?
`GET /health`	✅ (`admin`)	Full component health report

/live and /ready require no authentication — Kubernetes kubelet calls them directly. /health requires a valid JWT with the admin role because it includes internal dependency states.

GET /live — Liveness Probe

GET https://api.septemcore.com/live

Response 200 OK (process is alive):

{ "status": "ok" }

Response 503 Service Unavailable (process is in an unrecoverable state):

{ "status": "error", "reason": "goroutine panic, restarting" }

Kubernetes restarts the pod when /live returns non-2xx. A failing liveness probe indicates a bug or deadlock — not a dependency outage (that is /ready's concern).

GET /ready — Readiness Probe

GET https://api.septemcore.com/ready

Response 200 OK (pod can accept traffic):

{ "status": "ready" }

Response 503 Service Unavailable (pod should not receive traffic):

{
  "status": "not_ready",
  "reason": "flag cache not warmed"
}

Readiness Conditions

A pod is not ready (returns 503) when any of the following are true:

Condition	Description
Flag cache empty	`kernel.flags()` cache not yet loaded from Valkey/GoFeatureFlag
gRPC connection pool not established	Go Gateway Service has not connected to core services
Envoy xDS sync incomplete	Envoy has not received the latest route config

Kubernetes stops sending traffic to a pod that fails /ready. Existing pods with warm caches continue handling requests during rolling deploys — zero-downtime deployment guarantee.

GET /health — Full Component Health Report

Returns the health status of every platform component. Requires admin role.

GET https://api.septemcore.com/health
Authorization: Bearer <access_token>

Response 200 OK (all components healthy):

{
  "status": "healthy",
  "version": "1.14.3",
  "uptime":  "72h14m32s",
  "components": {
    "postgresql":    { "status": "healthy",  "latencyMs": 2 },
    "clickhouse":    { "status": "healthy",  "latencyMs": 5 },
    "kafka":         { "status": "healthy",  "latencyMs": 8 },
    "rabbitmq":      { "status": "healthy",  "latencyMs": 3 },
    "valkey":        { "status": "healthy",  "latencyMs": 1 },
    "s3":            { "status": "healthy",  "latencyMs": 12 },
    "gofeatureflag": { "status": "healthy",  "latencyMs": 4 },
    "envoy":         { "status": "healthy",  "latencyMs": 0 }
  }
}

Response 207 Multi-Status (one or more components degraded):

{
  "status": "degraded",
  "version": "1.14.3",
  "uptime":  "72h14m32s",
  "components": {
    "postgresql":  { "status": "healthy",  "latencyMs": 2 },
    "clickhouse":  { "status": "degraded", "latencyMs": 4200, "error": "response time > 2000ms" },
    "kafka":       { "status": "healthy",  "latencyMs": 8 },
    "rabbitmq":    { "status": "unhealthy","latencyMs": null, "error": "connection refused" },
    "valkey":      { "status": "healthy",  "latencyMs": 1 },
    "s3":          { "status": "healthy",  "latencyMs": 12 },
    "gofeatureflag": { "status": "healthy","latencyMs": 4 },
    "envoy":       { "status": "healthy",  "latencyMs": 0 }
  }
}

ComponentStatus Schema

Field	Type	Values	Description
`status`	`string`	`healthy`, `degraded`, `unhealthy`	Component health state
`latencyMs`	`integer` \| `null`	`0` – ∞ \| `null`	Round-trip latency of the health ping. `null` when unreachable
`error`	`string` \| `null`	Any	Human-readable error message when not `healthy`

Top-Level Status Rules

Top-level `status`	Condition	HTTP status
`healthy`	All components `healthy`	`200 OK`
`degraded`	At least one `degraded`, none `unhealthy`	`207 Multi-Status`
`unhealthy`	At least one `unhealthy`	`503 Service Unavailable`

Gateway Responsibilities

Function	Detail
Routing	Routes requests to the appropriate core service
JWT validation	Validates and decodes Bearer token on every request
Rate limiting	Token-bucket (hybrid: Envoy local + Valkey global)
OpenAPI validation	Validates request body and response body against OpenAPI 3.x schemas
CORS	Configurable per-tenant CORS rules
Protocol translation	REST → gRPC when the downstream service is Go gRPC
Request logging	Writes request metadata to Audit Service (`audit.api.called`)
Tenant-aware routing	Checks module availability for tenant before forwarding (`modules:active:{tenantId}` Valkey cache). Unknown module → `404 Not Found`

gRPC Call Policy

Parameter	Value
gRPC deadline	5 seconds (`GRPC_CALL_TIMEOUT_MS`). Timeout → `504 Gateway Timeout`
Circuit breaker	5 consecutive failures in 30 s → circuit open for 30 s → `503 Service Unavailable` without attempting downstream call (`sony/gobreaker`)
Per-tenant soft breaker	Error rate > 50% over 60 s → isolates that tenant. One tenant's failures cannot open the circuit for all
Retry	1 retry with 100 ms backoff for idempotent `GET` requests. No retry for `POST`/`PATCH`/`DELETE`
Partial failure (batch/stream)	`207 Multi-Status` with per-item RFC 9457 error detail

Rate Limiting

Limit	Default	Configuration
Default per authenticated user	1000 req / min	Envoy `RateLimit` filter
Admin role override	5000 req / min	Role-based override
Per-tenant global ceiling	Configurable per Billing plan	Billing → Gateway sync
Hot-key protection (> 10K RPS)	Envoy local token-bucket intercepts burst before Valkey lookup	—

429 Response Format

HTTP/1.1 429 Too Many Requests
Retry-After: 42
Content-Type: application/problem+json

{
  "type":     "https://septemcore.com/problems/rate-limit-exceeded",
  "title":    "Too Many Requests",
  "status":   429,
  "detail":   "Rate limit of 1000 requests/min exceeded. Retry after 42 seconds.",
  "instance": "/api/v1/data/crm/contacts",
  "traceId":  "4bf92f3577b34da6a3ce929d0e0e4736"
}

The Retry-After header is always present on 429 responses. The message is not queued at the Gateway level (unlike the Notify Service, which queues 429s). The caller must implement retry logic.

OpenAPI Version Snapshots

When an API version is deprecated, the Gateway saves an immutable snapshot of the OpenAPI specification to S3:

s3://api-specs/v{N}/openapi.yaml

This snapshot is served at GET /api-docs/v{N} indefinitely — clients migrating from a deprecated version can always access the old specification.

Error Reference

Error type	Status	Trigger
`problems/rate-limit-exceeded`	`429`	Per-user or per-tenant rate limit hit
`problems/unauthorized`	`401`	Missing or expired JWT
`problems/forbidden`	`403`	Valid JWT but insufficient permission
`problems/module-not-found`	`404`	Module not installed for this tenant
`problems/gateway-timeout`	`504`	gRPC deadline (5 s) exceeded
`problems/service-unavailable`	`503`	Circuit breaker open for this service
`problems/validation-failed`	`400`	Request body failed OpenAPI schema validation

Endpoints​

GET /live — Liveness Probe​

GET /ready — Readiness Probe​

Readiness Conditions​

GET /health — Full Component Health Report​

ComponentStatus Schema​

Top-Level Status Rules​

Gateway Responsibilities​

gRPC Call Policy​

Rate Limiting​

429 Response Format​

OpenAPI Version Snapshots​

Error Reference​