Skip to main content

Deployment

Platform Kernel ships as a polyglot monorepo (Go services + TypeScript shell) and supports two deployment targets:

EnvironmentOrchestratorUse case
Local / DevDocker ComposeFull-stack sandbox — one command
Staging / ProdKubernetes (k8s)Multi-tenant SaaS, HPA, rolling updates

Container Model — Dockerfile.monorepo

All 12 Go services share a single Dockerfile (Dockerfile.monorepo at monorepo root). The target service is selected via --build-arg SERVICE=<name>.

Build Stages

Key Build Properties

PropertyValue
Go version1.26 (from docker/versions.env)
Alpine version3.21 (from docker/versions.env)
BinaryFully static, no libc (CGO_ENABLED=0)
Debug symbolsStripped (-ldflags="-s -w") — minimizes image size
Userappuser UID 10001 — non-root
files serviceCGO_ENABLED=1 (requires libvips for bimg)
Go WorkspaceGOWORK=off during build — workspace resolution via replace directives
Platformlinux/amd64 (CI)

Image Tag Strategy

ghcr.io/{owner}/platform-kernel/{service}:{git-sha}
ghcr.io/{owner}/platform-kernel/{service}:latest # main branch only
ghcr.io/{owner}/platform-kernel/{service}:staging # staging tag

Local Development — Docker Compose

Stack Definition

The full development stack is defined in docker/docker-compose.yml. Versions are pinned in docker/versions.env (single source of truth for all environments):

# docker/versions.env (source of truth)
GO_VERSION=1.26
ALPINE_VERSION=3.21
POSTGRES_VERSION=17-alpine
VALKEY_VERSION=8.1-alpine
KAFKA_VERSION=3.9.0
RABBITMQ_VERSION=4.1-management-alpine
CLICKHOUSE_VERSION=25.3-alpine
SEAWEEDFS_VERSION=3.84
VAULT_VERSION=1.19
GO_FEATURE_FLAG_VERSION=v1.42.0
ENVOY_VERSION=v1.33-latest

Startup

# Start full kernel stack (all 12 services + infra)
docker compose \
--env-file docker/versions.env \
-f docker/docker-compose.yml \
up -d --wait

# Start only infrastructure (for local Go service development)
docker compose \
--env-file docker/versions.env \
-f docker/docker-compose.yml \
up -d --wait \
postgres valkey kafka clickhouse rabbitmq vault

Service Port Map (local)

ServicePortNotes
API Gateway (Envoy)8443HTTPS, TLS termination
Gateway (Go)8080Internal gRPC
IAM8081gRPC + /health
PostgreSQL5432platform_kernel database
ClickHouse8123 / 9000HTTP + native
Kafka9092Plaintext
Valkey6379
Vault8200Dev mode
GoFeatureFlag1031
SeaweedFS8888

Compose Profiles

Compose filePurpose
docker-compose.ymlFull stack
docker-compose.ci.ymlCI overlay — removes volumes, adds healthchecks
docker-compose.pact.ymlPact Broker overlay — Postgres + pact-broker UI
docker-compose.kafka.yamlKafka-only
docker-compose.sandbox.ymlTenant sandbox — isolated per-tenant env
docker-compose.gateway.yamlGateway + Envoy only

Production — Kubernetes

Zero-Downtime Rolling Update

All Kernel services use RollingUpdate deployment strategy with maxUnavailable: 0 — no pods are terminated until new pods are Ready.

# Example: IAM service Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: platform-iam
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # One extra pod spun up before old removed
maxUnavailable: 0 # Zero downtime — old pod stays until new ready
template:
spec:
containers:
- name: iam
image: ghcr.io/{owner}/platform-kernel/iam:{sha}
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 3
periodSeconds: 5
terminationGracePeriodSeconds: 30

Zero-Downtime Kafka Consumer Rolling (Cooperative Sticky)

Standard Kafka rebalance pauses all consumers in the group during partition reassignment. Cooperative Sticky rebalance pauses only the partitions being migrated:

Configured in services/event-bus Go consumer:

// services/event-bus — reader configuration
r := kafka.NewReader(kafka.ReaderConfig{
Brokers: cfg.Brokers,
GroupID: cfg.GroupID,
Topic: cfg.Topic,
GroupBalancers: []kafka.GroupBalancer{
kafka.CooperativeStickyGroupBalancer{},
},
})

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: platform-gateway
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: platform-gateway
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale out when CPU > 70%

HPA triggers at CPU > 70% (env HPA_CPU_THRESHOLD). Scale-down has a 5-minute stabilization window to prevent flapping.

Canary Deployment via Feature Flags

New features ship behind GoFeatureFlag gates. Traffic is split at the application level — no Kubernetes traffic-splitting required:

Canary configuration in docker/feature-flags/:

# feature-flags/flags.yaml
feature-x:
variations:
enabled: true
disabled: false
defaultRule:
percentage:
enabled: 10 # 10% of tenants get the new behavior
disabled: 90
targeting: []

Rollout sequence:

  1. Deploy new code (Feature Flag OFF for all) — zero impact
  2. Enable for 10% → monitor error rates and latency
  3. Ramp to 50% → 100% → remove flag from code

Envoy xDS Dynamic Config

Envoy Gateway uses xDS (LDS/RDS/CDS/EDS) for zero-restart config updates. When a new module is installed, the Gateway Service pushes a new RDS route config:

ENVOY_DRAIN_TIMEOUT_SEC=10 — in-flight requests on removed routes have 10 s to complete before the route is deleted.


CI/CD Pipeline

The pipeline is defined in .github/workflows/ci.yml. All jobs are GitHub Actions, running on ubuntu-latest. Load tests run on a self-hosted VDS runner tagged [self-hosted, linux, load-test].

Pipeline Graph

Job Details

JobTriggerToolSLA
LintEvery push/PRgolangci-lint v2 · ESLint 9.x · markdownlint< 2 min
Unit Tests — GoEvery push/PRgo test -race -count=1 all 12 services< 5 min
Unit Tests — TypeScriptEvery push/PRVitest + Playwright (Browser Mode)< 4 min
Build — GoEvery push/PRgo build -ldflags="-s -w" all 12< 3 min
Build — TypeScriptEvery push/PRpnpm build< 3 min
Integration TestsEvery PR (needs Build-Go)Testcontainers (real PG 16, Kafka 3.9, Valkey 8.1)< 8 min
Contract TestsEvery PR (needs Integration)Pact v4 Go SDK + self-hosted Pact Broker< 4 min
Security ScanEvery push/PRSnyk + OWASP ZAP + Semgrep< 6 min
Docker BuildEvery PR (needs Build-Go)BuildKit parallel --no-cache< 10 min
Smoke TestsEvery PR (needs Docker)curl /health/live /health/ready all services< 5 min
Load Testpush main onlyk6 — 10K RPS, p99 < 200 ms, 0 errors< 15 min
CI GateAfter all aboveRequired status check for branch protectioninstant

Concurrency Policy

concurrency:
group: ci-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
  • PRs: cancel in-progress on new push (saves CI minutes)
  • main: serialize — no torn deploys

Integration Test Stack

Testcontainers spins up real infrastructure (not mocks) for integration tests via docker compose up -d --wait:

postgres (PG 16-alpine) · valkey (8.1-alpine) · kafka (3.9.0) ·
clickhouse (25.3-alpine) · rabbitmq (4.1-management) · vault (1.19)

Environment variables injected:

TEST_POSTGRES_URL=postgres://kernel:kernel_dev_password@localhost:5432/platform_kernel
TEST_KAFKA_BROKERS=localhost:9092
TEST_VALKEY_ADDR=localhost:6379

See Also