Kubernetes Deployment

Production Kubernetes deployment of Platform-Kernel. StatefulSets for PostgreSQL and ClickHouse, Deployments for stateless Go services, Istio mTLS, Vault Agent sidecar, and horizontal scaling configuration.

This page documents the production Kubernetes deployment architecture for Platform-Kernel. As of April 2026, the project does not ship first-party Helm charts — operators must adapt the Docker Compose stack to their cluster using the patterns described here. Each Go service image is built from the project's multi-stage Dockerfile and pushed to GHCR (see CI pipeline).

Recommended toolchain:

Kubernetes 1.36 (April 2026 GA)
Helm 4.1.4
Istio 1.29.2 (service mesh, mTLS enforcement)
cert-manager 1.17+ (TLS certificate lifecycle)

Architecture Overview

Loading diagram...

Namespace Layout

kubectl create namespace platform-infra     # Stateful: PG, CH, Kafka, Vault
kubectl create namespace platform-services  # Go services
kubectl create namespace platform-ingress   # Envoy (Ingress)
kubectl create namespace istio-system       # Istio control plane

Docker Images

Each Go service is built using a two-stage Dockerfile:

Builder stage: golang:1.26.1-alpine — compiles a static binary with CGO_ENABLED=0 for cross-platform compatibility
Runtime stage: alpine:3.21 — minimal image, non-root user appuser (UID 10001), only ca-certificates, wget, tzdata

# Shared build pattern (all 12 Go services):
FROM golang:${GO_VERSION}-alpine AS builder
ARG TARGETOS TARGETARCH
RUN CGO_ENABLED=0 GOOS=$TARGETOS GOARCH=$TARGETARCH \
    go build -ldflags="-s -w" -o /service ./cmd/<service>

FROM alpine:${ALPINE_VERSION}
RUN adduser -D -u 10001 appuser
COPY --from=builder /service /service
USER appuser
EXPOSE 8080 50050
ENTRYPOINT ["/service"]

Images are published to GHCR:

ghcr.io/<org>/platform-kernel/<service>:<git-sha>

The CI pipeline (ci.yml) builds and pushes on every merge to main.

CI and Image Registry

The ci.yml pipeline (GitHub Actions) builds production images using the docker-build job. Required secrets and variables:

Secret / Variable	Description
`REGISTRY_TOKEN`	GHCR push credentials
`SONAR_TOKEN`	SonarQube 25.1 SAST upload
`STAGING_SSH_KEY`	SSH key for staging deploy
`STAGING_HOST`	Staging server hostname
`STAGING_USER`	SSH user on staging
`STAGING_WORK_DIR`	Working directory on staging

Build args passed to every docker build:

docker build \
  --build-arg GO_VERSION=1.26 \
  --build-arg ALPINE_VERSION=3.21 \
  -t ghcr.io/<org>/platform-kernel/iam:<sha> \
  -f services/iam/Dockerfile \
  .   # Build context = monorepo root (required for go.work)

Stateless Services — Deployment Pattern

All 12 Go services share the same Deployment pattern. Example for the IAM service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: iam
  namespace: platform-services
spec:
  replicas: 3
  selector:
    matchLabels:
      app: iam
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0      # Zero-downtime rolling update
  template:
    metadata:
      labels:
        app: iam
    spec:
      serviceAccountName: platform-iam
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        fsGroup: 10001
      containers:
        - name: iam
          image: ghcr.io/<org>/platform-kernel/iam:<sha>
          ports:
            - containerPort: 8080  # HTTP health + REST
            - containerPort: 50050 # gRPC
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: platform-db-secret
                  key: iam-dsn
            - name: VAULT_ADDR
              value: "http://vault.platform-infra.svc.cluster.local:8200"
            - name: VAULT_TOKEN
              valueFrom:
                secretKeyRef:
                  name: vault-token
                  key: token
            - name: IAM_GRPC_PORT
              value: "50050"
          resources:
            requests:
              cpu: 100m
              memory: 64Mi
            limits:
              cpu: 500m
              memory: 128Mi
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5

Key principles for all services:

maxUnavailable: 0 — zero downtime rolling update
Non-root user (UID 10001) enforced in both Dockerfile and securityContext
Memory limit matches Docker Compose deploy.resources.limits
Health probes use /health/live and /health/ready endpoints

StatefulSets — PostgreSQL

PostgreSQL 17 requires a StatefulSet with persistent volumes:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: platform-infra
spec:
  serviceName: postgres
  replicas: 2  # 1 primary + 1 replica (streaming replication)
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:17-alpine
          env:
            - name: POSTGRES_DB
              value: platform_kernel
            - name: POSTGRES_USER
              value: kernel
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
          resources:
            limits:
              memory: 512Mi
              cpu: "2"
          volumeMounts:
            - name: postgres-data
              mountPath: /var/lib/postgresql/data
          livenessProbe:
            exec:
              command: ["pg_isready", "-U", "kernel", "-d", "platform_kernel"]
            periodSeconds: 10
  volumeClaimTemplates:
    - metadata:
        name: postgres-data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd   # NVMe StorageClass required
        resources:
          requests:
            storage: 200Gi

RLS note: PostgreSQL Row-Level Security is applied at the application level via goose migrations. Kubernetes does not require any special PostgreSQL configuration for RLS.

StatefulSets — ClickHouse

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: clickhouse
  namespace: platform-infra
spec:
  serviceName: clickhouse
  replicas: 2  # 1 shard + 1 replica
  template:
    spec:
      containers:
        - name: clickhouse
          image: clickhouse/clickhouse-server:25.3-alpine
          resources:
            limits:
              memory: 2Gi
              cpu: "4"
          volumeMounts:
            - name: clickhouse-data
              mountPath: /var/lib/clickhouse
  volumeClaimTemplates:
    - metadata:
        name: clickhouse-data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 500Gi

Kafka — KRaft Mode (No ZooKeeper)

Kafka 3.9.0 runs in KRaft mode — no separate ZooKeeper cluster is required. Deploy as a 3-node StatefulSet for HA:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka
  namespace: platform-infra
spec:
  serviceName: kafka
  replicas: 3
  template:
    spec:
      containers:
        - name: kafka
          image: apache/kafka:3.9.0
          env:
            - name: KAFKA_PROCESS_ROLES
              value: broker,controller
            - name: KAFKA_CONTROLLER_QUORUM_VOTERS
              value: "0@kafka-0:9093,1@kafka-1:9093,2@kafka-2:9093"
            - name: KAFKA_LOG_DIRS
              value: /var/lib/kafka/data
            - name: KAFKA_NUM_PARTITIONS
              value: "6"
            - name: KAFKA_DEFAULT_REPLICATION_FACTOR
              value: "3"
            - name: KAFKA_MESSAGE_MAX_BYTES
              value: "1048576"       # 1 MB max event payload
            - name: KAFKA_LOG_RETENTION_HOURS
              value: "168"           # 7 days
            - name: KAFKA_AUTO_CREATE_TOPICS_ENABLE
              value: "false"
          resources:
            limits:
              memory: 512Mi
              cpu: "2"
  volumeClaimTemplates:
    - metadata:
        name: kafka-data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 200Gi

HashiCorp Vault — HA Raft Mode

In production, Vault runs in HA mode with integrated Raft storage (3 nodes). In dev, Vault runs as a single node with server -dev.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: vault
  namespace: platform-infra
spec:
  serviceName: vault
  replicas: 3
  template:
    spec:
      containers:
        - name: vault
          image: hashicorp/vault:1.19
          args: ["server"]
          env:
            - name: VAULT_LOCAL_CONFIG
              value: |
                ui            = true
                cluster_addr  = "https://$(POD_IP):8201"
                api_addr      = "https://$(POD_IP):8200"
                storage "raft" {
                  path    = "/vault/data"
                  node_id = "$(POD_NAME)"
                }
                listener "tcp" {
                  address     = "0.0.0.0:8200"
                  tls_disable = false
                  tls_cert_file = "/vault/tls/tls.crt"
                  tls_key_file  = "/vault/tls/tls.key"
                }
          securityContext:
            capabilities:
              add: ["IPC_LOCK"]   # Required for Vault memory locking
          ports:
            - containerPort: 8200
            - containerPort: 8201  # Cluster/raft

Vault roles used by Platform-Kernel:

Role	Policies
`platform-iam`	`read jwt-signing-keys`, `write token-store`
`platform-domain-resolver`	`read tls-certs`, `write acme-challenges`
`platform-services`	`read db-creds` (dynamic secrets)

JWT signing keys (ES256 P-256) are stored in Vault KV v2 and rotated every 90 days using Vault's credential rotation with the dual-key strategy (old key remains valid during rotation window).

Istio Service Mesh

Istio 1.29.2 enforces mutual TLS (mTLS) between all services in the platform-services namespace:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: platform-services
spec:
  mtls:
    mode: STRICT

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: platform-services-mtls
  namespace: platform-services
spec:
  host: "*.platform-services.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

This replaces the services/shared/mtls Go package's self-managed certificate handling in production. In development (Docker Compose), shared/mtls handles mTLS without a service mesh.

Horizontal Pod Autoscaling

CPU-based HPA for stateless Go services:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: gateway-hpa
  namespace: platform-services
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: gateway
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75

Apply the same pattern to iam, data-layer, money — the services under highest load.

Zero-Downtime Deployment

Platform-Kernel's zero-downtime strategy uses:

maxUnavailable: 0 in all Deployment rolling update strategies
Kafka cooperative sticky rebalancing — consumers continue processing during rolling restarts without full partition rebalance
Kafka static group membership (group.instance.id = $POD_NAME) — prevents unnecessary consumer group rebalances when pods restart
Readiness gating — kubectl rollout status waits for all pods to pass /health/ready before marking the rollout complete

# Deploy a new image version:
kubectl set image deployment/iam \
  iam=ghcr.io/<org>/platform-kernel/iam:<new-sha> \
  -n platform-services

# Monitor rollout:
kubectl rollout status deployment/iam -n platform-services
# → deployment "iam" successfully rolled out

# Rollback if needed:
kubectl rollout undo deployment/iam -n platform-services

Secret Management

All secrets are stored in Vault and injected via the Vault Agent sidecar or Vault Secrets Operator (VSO):

# Kubernetes Secret referencing Vault (via VSO):
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
  name: platform-jwt-keys
  namespace: platform-services
spec:
  type: kv-v2
  mount: secret
  path: platform/iam/jwt-keys
  destination:
    name: jwt-keys
    create: true
  refreshAfter: 1h

Do not store JWT_PRIVATE_KEY or JWT_PUBLIC_KEY in plain Kubernetes Secrets without encryption at rest.

Node Affinity

Separate stateful and stateless containers to prevent resource contention:

# For Go service Deployments:
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: node-type
              operator: In
              values: ["stateless"]

# For StatefulSets (PG, CH, Kafka):
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: node-type
              operator: In
              values: ["stateful"]

Label your nodes accordingly:

kubectl label node <worker-1> node-type=stateless
kubectl label node <worker-2> node-type=stateless
kubectl label node <storage-1> node-type=stateful
kubectl label node <storage-2> node-type=stateful