Deployment
Platform Kernel ships as a polyglot monorepo (Go services + TypeScript shell) and supports two deployment targets:
| Environment | Orchestrator | Use case |
|---|---|---|
| Local / Dev | Docker Compose | Full-stack sandbox — one command |
| Staging / Prod | Kubernetes (k8s) | Multi-tenant SaaS, HPA, rolling updates |
Container Model — Dockerfile.monorepo
All 12 Go services share a single Dockerfile (Dockerfile.monorepo at
monorepo root). The target service is selected via --build-arg SERVICE=<name>.
Build Stages
Key Build Properties
| Property | Value |
|---|---|
| Go version | 1.26 (from docker/versions.env) |
| Alpine version | 3.21 (from docker/versions.env) |
| Binary | Fully static, no libc (CGO_ENABLED=0) |
| Debug symbols | Stripped (-ldflags="-s -w") — minimizes image size |
| User | appuser UID 10001 — non-root |
files service | CGO_ENABLED=1 (requires libvips for bimg) |
| Go Workspace | GOWORK=off during build — workspace resolution via replace directives |
| Platform | linux/amd64 (CI) |
Image Tag Strategy
ghcr.io/{owner}/platform-kernel/{service}:{git-sha}
ghcr.io/{owner}/platform-kernel/{service}:latest # main branch only
ghcr.io/{owner}/platform-kernel/{service}:staging # staging tag
Local Development — Docker Compose
Stack Definition
The full development stack is defined in docker/docker-compose.yml. Versions
are pinned in docker/versions.env (single source of truth for all
environments):
# docker/versions.env (source of truth)
GO_VERSION=1.26
ALPINE_VERSION=3.21
POSTGRES_VERSION=17-alpine
VALKEY_VERSION=8.1-alpine
KAFKA_VERSION=3.9.0
RABBITMQ_VERSION=4.1-management-alpine
CLICKHOUSE_VERSION=25.3-alpine
SEAWEEDFS_VERSION=3.84
VAULT_VERSION=1.19
GO_FEATURE_FLAG_VERSION=v1.42.0
ENVOY_VERSION=v1.33-latest
Startup
# Start full kernel stack (all 12 services + infra)
docker compose \
--env-file docker/versions.env \
-f docker/docker-compose.yml \
up -d --wait
# Start only infrastructure (for local Go service development)
docker compose \
--env-file docker/versions.env \
-f docker/docker-compose.yml \
up -d --wait \
postgres valkey kafka clickhouse rabbitmq vault
Service Port Map (local)
| Service | Port | Notes |
|---|---|---|
| API Gateway (Envoy) | 8443 | HTTPS, TLS termination |
| Gateway (Go) | 8080 | Internal gRPC |
| IAM | 8081 | gRPC + /health |
| PostgreSQL | 5432 | platform_kernel database |
| ClickHouse | 8123 / 9000 | HTTP + native |
| Kafka | 9092 | Plaintext |
| Valkey | 6379 | — |
| Vault | 8200 | Dev mode |
| GoFeatureFlag | 1031 | — |
| SeaweedFS | 8888 | — |
Compose Profiles
| Compose file | Purpose |
|---|---|
docker-compose.yml | Full stack |
docker-compose.ci.yml | CI overlay — removes volumes, adds healthchecks |
docker-compose.pact.yml | Pact Broker overlay — Postgres + pact-broker UI |
docker-compose.kafka.yaml | Kafka-only |
docker-compose.sandbox.yml | Tenant sandbox — isolated per-tenant env |
docker-compose.gateway.yaml | Gateway + Envoy only |
Production — Kubernetes
Zero-Downtime Rolling Update
All Kernel services use RollingUpdate deployment strategy with
maxUnavailable: 0 — no pods are terminated until new pods are Ready.
# Example: IAM service Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: platform-iam
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # One extra pod spun up before old removed
maxUnavailable: 0 # Zero downtime — old pod stays until new ready
template:
spec:
containers:
- name: iam
image: ghcr.io/{owner}/platform-kernel/iam:{sha}
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 3
periodSeconds: 5
terminationGracePeriodSeconds: 30
Zero-Downtime Kafka Consumer Rolling (Cooperative Sticky)
Standard Kafka rebalance pauses all consumers in the group during partition reassignment. Cooperative Sticky rebalance pauses only the partitions being migrated:
Configured in services/event-bus Go consumer:
// services/event-bus — reader configuration
r := kafka.NewReader(kafka.ReaderConfig{
Brokers: cfg.Brokers,
GroupID: cfg.GroupID,
Topic: cfg.Topic,
GroupBalancers: []kafka.GroupBalancer{
kafka.CooperativeStickyGroupBalancer{},
},
})
Horizontal Pod Autoscaler (HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: platform-gateway
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: platform-gateway
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale out when CPU > 70%
HPA triggers at CPU > 70% (env
HPA_CPU_THRESHOLD). Scale-down has a 5-minute stabilization window to prevent flapping.
Canary Deployment via Feature Flags
New features ship behind GoFeatureFlag gates. Traffic is split at the application level — no Kubernetes traffic-splitting required:
Canary configuration in docker/feature-flags/:
# feature-flags/flags.yaml
feature-x:
variations:
enabled: true
disabled: false
defaultRule:
percentage:
enabled: 10 # 10% of tenants get the new behavior
disabled: 90
targeting: []
Rollout sequence:
- Deploy new code (Feature Flag OFF for all) — zero impact
- Enable for 10% → monitor error rates and latency
- Ramp to 50% → 100% → remove flag from code
Envoy xDS Dynamic Config
Envoy Gateway uses xDS (LDS/RDS/CDS/EDS) for zero-restart config updates. When a new module is installed, the Gateway Service pushes a new RDS route config:
ENVOY_DRAIN_TIMEOUT_SEC=10 — in-flight requests on removed routes have 10 s
to complete before the route is deleted.
CI/CD Pipeline
The pipeline is defined in .github/workflows/ci.yml. All jobs are GitHub
Actions, running on ubuntu-latest. Load tests run on a self-hosted VDS
runner tagged [self-hosted, linux, load-test].
Pipeline Graph
Job Details
| Job | Trigger | Tool | SLA |
|---|---|---|---|
| Lint | Every push/PR | golangci-lint v2 · ESLint 9.x · markdownlint | < 2 min |
| Unit Tests — Go | Every push/PR | go test -race -count=1 all 12 services | < 5 min |
| Unit Tests — TypeScript | Every push/PR | Vitest + Playwright (Browser Mode) | < 4 min |
| Build — Go | Every push/PR | go build -ldflags="-s -w" all 12 | < 3 min |
| Build — TypeScript | Every push/PR | pnpm build | < 3 min |
| Integration Tests | Every PR (needs Build-Go) | Testcontainers (real PG 16, Kafka 3.9, Valkey 8.1) | < 8 min |
| Contract Tests | Every PR (needs Integration) | Pact v4 Go SDK + self-hosted Pact Broker | < 4 min |
| Security Scan | Every push/PR | Snyk + OWASP ZAP + Semgrep | < 6 min |
| Docker Build | Every PR (needs Build-Go) | BuildKit parallel --no-cache | < 10 min |
| Smoke Tests | Every PR (needs Docker) | curl /health/live /health/ready all services | < 5 min |
| Load Test | push main only | k6 — 10K RPS, p99 < 200 ms, 0 errors | < 15 min |
| CI Gate | After all above | Required status check for branch protection | instant |
Concurrency Policy
concurrency:
group: ci-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
- PRs: cancel in-progress on new push (saves CI minutes)
- main: serialize — no torn deploys
Integration Test Stack
Testcontainers spins up real infrastructure (not mocks) for integration
tests via docker compose up -d --wait:
postgres (PG 16-alpine) · valkey (8.1-alpine) · kafka (3.9.0) ·
clickhouse (25.3-alpine) · rabbitmq (4.1-management) · vault (1.19)
Environment variables injected:
TEST_POSTGRES_URL=postgres://kernel:kernel_dev_password@localhost:5432/platform_kernel
TEST_KAFKA_BROKERS=localhost:9092
TEST_VALKEY_ADDR=localhost:6379
See Also
- Testing Strategy — 8-level testing pyramid
- Observability — OpenTelemetry → VictoriaMetrics
- Deduplication — 3-layer idempotency