Skip to main content

Integration Hub — Overview

The Integration Hub is the platform's single outbound gateway for all calls to external APIs — payment processors, SMS providers, email services, analytics platforms, advertising networks, and custom third-party services. Every module that needs to reach an external API does so only through the Integration Hub — direct outbound calls from modules are prohibited.

This mirrors the inbound architecture: the API Gateway is the single point of entry for all external traffic into the platform; the Integration Hub is the single point of exit.


Technical Stack

ComponentTechnologyDetail
LanguageGonet/http client + chi router
Circuit Breakersony/gobreakerPer-provider state machine (CLOSED / HALF_OPEN / OPEN)
DatabasePostgreSQLProvider registry, DLQ, configuration
CacheValkeyCircuit breaker state, outgoing rate limiting counters
DLQ queueRabbitMQRetry queue for failed requests
gRPCProtobuf definitionInternal service API
OpenAPISpecification fileREST API schema

Paths & Configuration

gRPC:

proto/platform/integration_hub/v1/integration_hub_service.proto

OpenAPI:

services/integration-hub/api/openapi.yaml

Internal Packages

PackageResponsibility
internal/configEnvironment configuration
internal/dlqDead Letter Queue: persist, paginate, retry
internal/handlerHTTP request handlers
internal/integrationProvider call logic, circuit breaker, retry, timeout
internal/kafkaKafka event publisher
internal/rabbitmqRabbitMQ DLQ retry queue consumer
internal/schemaProvider schema validation
internal/tenantTenant-scoped provider registry

Why All Modules Use the Hub

External APIs are unreliable by nature. Without centralized management, every module would independently implement circuit breaking, retry, timeout, and credential management — duplicating logic and creating inconsistent failure behavior. The Integration Hub enforces these patterns uniformly:

ProblemHub solution
External API flappingCircuit breaker isolates failures per provider
Cascading timeoutsPer-provider timeout cap (default 10s, max 30s)
Thundering herd on retryExponential backoff with ±500ms jitter
Credential exposureCredentials encrypted at rest (AES-256-GCM)
Untracked failuresAll failed calls go to DLQ (30-day retention)
No outgoing rate controlPer-provider rate limit (100 req/s by default)

Resilience Patterns

Circuit Breaker (sony/gobreaker)

Each provider has its own circuit breaker instance, independent from all other providers:

CLOSED (normal operation)

│ 5 consecutive failures within 30 seconds

OPEN (all calls rejected immediately, no upstream call made)

│ After 60 seconds

HALF_OPEN (1 probe request allowed)

├─ Probe succeeds → CLOSED
└─ Probe fails → OPEN (60s reset again)
ParameterValue
Failure threshold5 consecutive errors within 30 seconds
Open duration60 seconds
HALF_OPEN probes1 request
Error when OPEN503 Service Unavailable + X-Circuit-Breaker-Open: true header
State stored inValkey (shared across Hub instances) — no split-brain on scale-out

When the circuit is OPEN, the Hub returns an immediate 503 without making any outbound request. This protects the external provider from being hammered during a failure, and protects upstream module callers from accumulating latency.

Retry + Exponential Backoff

Retries are applied after a failed call on a CLOSED or HALF_OPEN circuit:

Attempt 1: fails
→ wait 1s ± 500ms jitter
Attempt 2: fails
→ wait 2s ± 500ms jitter
Attempt 3: fails
→ wait 4s ± 500ms jitter
Attempt 4: fails
→ wait 8s ± 500ms jitter
Attempt 5: fails
→ max retries exhausted → send to DLQ
ParameterValue
Max retries5
Base delay1 second
Multiplier2× per attempt
Jitter±500ms (prevents synchronized retry storms)
IdempotencyUUID per outgoing request (CONVENTIONS §24) — safe to retry

Retries only apply to idempotent-safe failures (network timeout, 5xx). 4xx responses from the external provider are not retried — they indicate a client-side error that retrying will not fix.

Timeout

ParameterValue
Default timeout10 seconds
Maximum timeout30 seconds (per-provider override)
ConfigurationProvider config.timeout_ms field
Timeout response504 Gateway Timeout to the calling module

Outgoing Rate Limiting

Module → Hub: POST /api/v1/integrations/call { providerId: "..." }

Hub: Valkey incr outgoing_rate:{providerId} per second
→ under limit (100 req/s) → forward to external API
→ over limit → 429 Too Many Requests (caller must backoff)
Env variableDefault
INTEGRATION_RATE_LIMIT_PER_SEC100

Call Flow

Module: POST /api/v1/integrations/call
{ "providerId": "01j9pint...", "method": "POST", "path": "/v1/charges",
"body": { ... }, "idempotencyKey": "uuid-..." }

Integration Hub:
1. Resolve provider by ID → get baseUrl, credentials, config
2. Check outgoing rate limit (Valkey)
3. Check circuit breaker state
┌─ OPEN → 503, no upstream call
└─ CLOSED / HALF_OPEN → continue
4. Decrypt credentials (AES-256-GCM, Vault-managed key)
5. Build outgoing HTTP request: baseUrl + path, inject auth headers
6. Execute with timeout (provider config or default 10s)
7. ┌─ Success (2xx) → return response to module
└─ Failure → retry (up to 5 times, exponential backoff)
→ max retries exhausted → persist to DLQ
→ return 502 Bad Gateway to module

REST API

Base Path

For layout brevity, the /api/v1 base path prefix is omitted from the endpoint table below.

MethodEndpointDescription
POST/integrations/providersRegister a new provider
GET/integrations/providersList providers (paginated)
GET/integrations/providers/:idGet provider details
PATCH/integrations/providers/:idUpdate provider config or credentials
DELETE/integrations/providers/:idSoft delete provider
GET/integrations/providers/:id/healthProvider health + circuit breaker state
POST/integrations/callCall a provider (via circuit breaker)
GET/integrations/dlqDead Letter Queue (paginated)
POST/integrations/dlq/:id/retryRetry one DLQ entry
POST/integrations/dlq/retry-allRetry all DLQ entries
DELETE/integrations/dlq/:idDelete DLQ entry

  • Providers — provider model, auth types, credentials, AES-256-GCM encryption, status values, CRUD examples, health check
  • Dead Letter Queue — DLQ schema, paginated retrieval, retry single and bulk, 30-day retention, idempotency