SeptemCore LogoSeptemCore
Architecture

Deployment

Docker Compose (dev) and Kubernetes (prod) deployment model. Multi-stage Dockerfile.monorepo, Go Workspace–native builds. Zero-downtime rolling updates, Kafka cooperative sticky rebalance, canary via Feature Flags, CI/CD pipeline from commit to production.

Platform Kernel ships as a polyglot monorepo (Go services + TypeScript shell) and supports two deployment targets:

EnvironmentOrchestratorUse case
Local / DevDocker ComposeFull-stack sandbox — one command
Staging / ProdKubernetes (k8s)Multi-tenant SaaS, HPA, rolling updates

Container Model — Dockerfile.monorepo

All 12 Go services share a single Dockerfile (Dockerfile.monorepo at monorepo root). The target service is selected via --build-arg SERVICE=<name>.

Build Stages

Loading diagram...

Key Build Properties

PropertyValue
Go version1.26 (from docker/versions.env)
Alpine version3.21 (from docker/versions.env)
BinaryFully static, no libc (CGO_ENABLED=0)
Debug symbolsStripped (-ldflags="-s -w") — minimizes image size
Userappuser UID 10001 — non-root
files serviceCGO_ENABLED=1 (requires libvips for bimg)
Go WorkspaceGOWORK=off during build — workspace resolution via replace directives
Platformlinux/amd64 (CI)

Image Tag Strategy

ghcr.io/{owner}/platform-kernel/{service}:{git-sha}
ghcr.io/{owner}/platform-kernel/{service}:latest          # main branch only
ghcr.io/{owner}/platform-kernel/{service}:staging         # staging tag

Local Development — Docker Compose

Stack Definition

The full development stack is defined in docker/docker-compose.yml. Versions are pinned in docker/versions.env (single source of truth for all environments):

# docker/versions.env (source of truth)
GO_VERSION=1.26
ALPINE_VERSION=3.21
POSTGRES_VERSION=17-alpine
VALKEY_VERSION=8.1-alpine
KAFKA_VERSION=3.9.0
RABBITMQ_VERSION=4.1-management-alpine
CLICKHOUSE_VERSION=25.3-alpine
SEAWEEDFS_VERSION=3.84
VAULT_VERSION=1.19
GO_FEATURE_FLAG_VERSION=v1.42.0
ENVOY_VERSION=v1.33-latest

Startup

# Start full kernel stack (all 12 services + infra)
docker compose \
  --env-file docker/versions.env \
  -f docker/docker-compose.yml \
  up -d --wait

# Start only infrastructure (for local Go service development)
docker compose \
  --env-file docker/versions.env \
  -f docker/docker-compose.yml \
  up -d --wait \
  postgres valkey kafka clickhouse rabbitmq vault

Service Port Map (local)

ServicePortNotes
API Gateway (Envoy)8443HTTPS, TLS termination
Gateway (Go)8080Internal gRPC
IAM8081gRPC + /health
PostgreSQL5432platform_kernel database
ClickHouse8123 / 9000HTTP + native
Kafka9092Plaintext
Valkey6379
Vault8200Dev mode
GoFeatureFlag1031
SeaweedFS8888

Compose Profiles

Compose filePurpose
docker-compose.ymlFull stack
docker-compose.ci.ymlCI overlay — removes volumes, adds healthchecks
docker-compose.pact.ymlPact Broker overlay — Postgres + pact-broker UI
docker-compose.kafka.yamlKafka-only
docker-compose.sandbox.ymlTenant sandbox — isolated per-tenant env
docker-compose.gateway.yamlGateway + Envoy only

Production — Kubernetes

Zero-Downtime Rolling Update

All Kernel services use RollingUpdate deployment strategy with maxUnavailable: 0 — no pods are terminated until new pods are Ready.

# Example: IAM service Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: platform-iam
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # One extra pod spun up before old removed
      maxUnavailable: 0    # Zero downtime — old pod stays until new ready
  template:
    spec:
      containers:
        - name: iam
          image: ghcr.io/{owner}/platform-kernel/iam:{sha}
          ports:
            - containerPort: 8080
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 3
            periodSeconds: 5
          terminationGracePeriodSeconds: 30

Zero-Downtime Kafka Consumer Rolling (Cooperative Sticky)

Standard Kafka rebalance pauses all consumers in the group during partition reassignment. Cooperative Sticky rebalance pauses only the partitions being migrated:

Loading diagram...

Configured in services/event-bus Go consumer:

// services/event-bus — reader configuration
r := kafka.NewReader(kafka.ReaderConfig{
    Brokers:        cfg.Brokers,
    GroupID:        cfg.GroupID,
    Topic:          cfg.Topic,
    GroupBalancers: []kafka.GroupBalancer{
        kafka.CooperativeStickyGroupBalancer{},
    },
})

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: platform-gateway
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: platform-gateway
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70   # Scale out when CPU > 70%

HPA triggers at CPU > 70% (env HPA_CPU_THRESHOLD). Scale-down has a 5-minute stabilization window to prevent flapping.

Canary Deployment via Feature Flags

New features ship behind GoFeatureFlag gates. Traffic is split at the application level — no Kubernetes traffic-splitting required:

Loading diagram...

Canary configuration in docker/feature-flags/:

# feature-flags/flags.yaml
feature-x:
  variations:
    enabled: true
    disabled: false
  defaultRule:
    percentage:
      enabled: 10   # 10% of tenants get the new behavior
      disabled: 90
  targeting: []

Rollout sequence:

  1. Deploy new code (Feature Flag OFF for all) — zero impact
  2. Enable for 10% → monitor error rates and latency
  3. Ramp to 50% → 100% → remove flag from code

Envoy xDS Dynamic Config

Envoy Gateway uses xDS (LDS/RDS/CDS/EDS) for zero-restart config updates. When a new module is installed, the Gateway Service pushes a new RDS route config:

Loading diagram...

ENVOY_DRAIN_TIMEOUT_SEC=10 — in-flight requests on removed routes have 10 s to complete before the route is deleted.


CI/CD Pipeline

The pipeline is defined in .github/workflows/ci.yml. All jobs are GitHub Actions, running on ubuntu-latest. Load tests run on a self-hosted VDS runner tagged [self-hosted, linux, load-test].

Pipeline Graph

Loading diagram...

Job Details

JobTriggerToolSLA
LintEvery push/PRgolangci-lint v2 · ESLint 9.x · markdownlint< 2 min
Unit Tests — GoEvery push/PRgo test -race -count=1 all 12 services< 5 min
Unit Tests — TypeScriptEvery push/PRVitest + Playwright (Browser Mode)< 4 min
Build — GoEvery push/PRgo build -ldflags="-s -w" all 12< 3 min
Build — TypeScriptEvery push/PRpnpm build< 3 min
Integration TestsEvery PR (needs Build-Go)Testcontainers (real PG 16, Kafka 3.9, Valkey 8.1)< 8 min
Contract TestsEvery PR (needs Integration)Pact v4 Go SDK + self-hosted Pact Broker< 4 min
Security ScanEvery push/PRSnyk + OWASP ZAP + Semgrep< 6 min
Docker BuildEvery PR (needs Build-Go)BuildKit parallel --no-cache< 10 min
Smoke TestsEvery PR (needs Docker)curl /health/live /health/ready all services< 5 min
Load Testpush main onlyk6 — 10K RPS, p99 < 200 ms, 0 errors< 15 min
CI GateAfter all aboveRequired status check for branch protectioninstant

Concurrency Policy

concurrency:
  group: ci-${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
  • PRs: cancel in-progress on new push (saves CI minutes)
  • main: serialize — no torn deploys

Integration Test Stack

Testcontainers spins up real infrastructure (not mocks) for integration tests via docker compose up -d --wait:

postgres (PG 16-alpine) · valkey (8.1-alpine) · kafka (3.9.0) ·
clickhouse (25.3-alpine) · rabbitmq (4.1-management) · vault (1.19)

Environment variables injected:

TEST_POSTGRES_URL=postgres://kernel:kernel_dev_password@localhost:5432/platform_kernel
TEST_KAFKA_BROKERS=localhost:9092
TEST_VALKEY_ADDR=localhost:6379

See Also

On this page