Skip to main content

Kubernetes Deployment

This page documents the production Kubernetes deployment architecture for Platform-Kernel. As of April 2026, the project does not ship first-party Helm charts — operators must adapt the Docker Compose stack to their cluster using the patterns described here. Each Go service image is built from the project's multi-stage Dockerfile and pushed to GHCR (see CI pipeline).

Recommended toolchain:

  • Kubernetes 1.36 (April 2026 GA)
  • Helm 4.1.4
  • Istio 1.29.2 (service mesh, mTLS enforcement)
  • cert-manager 1.17+ (TLS certificate lifecycle)

Architecture Overview


Namespace Layout

kubectl create namespace platform-infra # Stateful: PG, CH, Kafka, Vault
kubectl create namespace platform-services # Go services
kubectl create namespace platform-ingress # Envoy (Ingress)
kubectl create namespace istio-system # Istio control plane

Docker Images

Each Go service is built using a two-stage Dockerfile:

  1. Builder stage: golang:1.26.1-alpine — compiles a static binary with CGO_ENABLED=0 for cross-platform compatibility
  2. Runtime stage: alpine:3.21 — minimal image, non-root user appuser (UID 10001), only ca-certificates, wget, tzdata
# Shared build pattern (all 12 Go services):
FROM golang:${GO_VERSION}-alpine AS builder
ARG TARGETOS TARGETARCH
RUN CGO_ENABLED=0 GOOS=$TARGETOS GOARCH=$TARGETARCH \
go build -ldflags="-s -w" -o /service ./cmd/<service>

FROM alpine:${ALPINE_VERSION}
RUN adduser -D -u 10001 appuser
COPY --from=builder /service /service
USER appuser
EXPOSE 8080 50050
ENTRYPOINT ["/service"]

Images are published to GHCR:

ghcr.io/<org>/platform-kernel/<service>:<git-sha>

The CI pipeline (ci.yml) builds and pushes on every merge to main.


CI and Image Registry

The ci.yml pipeline (GitHub Actions) builds production images using the docker-build job. Required secrets and variables:

Secret / VariableDescription
REGISTRY_TOKENGHCR push credentials
SONAR_TOKENSonarQube 25.1 SAST upload
STAGING_SSH_KEYSSH key for staging deploy
STAGING_HOSTStaging server hostname
STAGING_USERSSH user on staging
STAGING_WORK_DIRWorking directory on staging

Build args passed to every docker build:

docker build \
--build-arg GO_VERSION=1.26 \
--build-arg ALPINE_VERSION=3.21 \
-t ghcr.io/<org>/platform-kernel/iam:<sha> \
-f services/iam/Dockerfile \
. # Build context = monorepo root (required for go.work)

Stateless Services — Deployment Pattern

All 12 Go services share the same Deployment pattern. Example for the IAM service:

apiVersion: apps/v1
kind: Deployment
metadata:
name: iam
namespace: platform-services
spec:
replicas: 3
selector:
matchLabels:
app: iam
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # Zero-downtime rolling update
template:
metadata:
labels:
app: iam
spec:
serviceAccountName: platform-iam
securityContext:
runAsNonRoot: true
runAsUser: 10001
fsGroup: 10001
containers:
- name: iam
image: ghcr.io/<org>/platform-kernel/iam:<sha>
ports:
- containerPort: 8080 # HTTP health + REST
- containerPort: 50050 # gRPC
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: platform-db-secret
key: iam-dsn
- name: VAULT_ADDR
value: "http://vault.platform-infra.svc.cluster.local:8200"
- name: VAULT_TOKEN
valueFrom:
secretKeyRef:
name: vault-token
key: token
- name: IAM_GRPC_PORT
value: "50050"
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 500m
memory: 128Mi
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5

Key principles for all services:

  • maxUnavailable: 0 — zero downtime rolling update
  • Non-root user (UID 10001) enforced in both Dockerfile and securityContext
  • Memory limit matches Docker Compose deploy.resources.limits
  • Health probes use /health/live and /health/ready endpoints

StatefulSets — PostgreSQL

PostgreSQL 17 requires a StatefulSet with persistent volumes:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: platform-infra
spec:
serviceName: postgres
replicas: 2 # 1 primary + 1 replica (streaming replication)
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:17-alpine
env:
- name: POSTGRES_DB
value: platform_kernel
- name: POSTGRES_USER
value: kernel
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
resources:
limits:
memory: 512Mi
cpu: "2"
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
livenessProbe:
exec:
command: ["pg_isready", "-U", "kernel", "-d", "platform_kernel"]
periodSeconds: 10
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd # NVMe StorageClass required
resources:
requests:
storage: 200Gi

RLS note: PostgreSQL Row-Level Security is applied at the application level via goose migrations. Kubernetes does not require any special PostgreSQL configuration for RLS.


StatefulSets — ClickHouse

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: clickhouse
namespace: platform-infra
spec:
serviceName: clickhouse
replicas: 2 # 1 shard + 1 replica
template:
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:25.3-alpine
resources:
limits:
memory: 2Gi
cpu: "4"
volumeMounts:
- name: clickhouse-data
mountPath: /var/lib/clickhouse
volumeClaimTemplates:
- metadata:
name: clickhouse-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 500Gi

Kafka — KRaft Mode (No ZooKeeper)

Kafka 3.9.0 runs in KRaft mode — no separate ZooKeeper cluster is required. Deploy as a 3-node StatefulSet for HA:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kafka
namespace: platform-infra
spec:
serviceName: kafka
replicas: 3
template:
spec:
containers:
- name: kafka
image: apache/kafka:3.9.0
env:
- name: KAFKA_PROCESS_ROLES
value: broker,controller
- name: KAFKA_CONTROLLER_QUORUM_VOTERS
value: "0@kafka-0:9093,1@kafka-1:9093,2@kafka-2:9093"
- name: KAFKA_LOG_DIRS
value: /var/lib/kafka/data
- name: KAFKA_NUM_PARTITIONS
value: "6"
- name: KAFKA_DEFAULT_REPLICATION_FACTOR
value: "3"
- name: KAFKA_MESSAGE_MAX_BYTES
value: "1048576" # 1 MB max event payload
- name: KAFKA_LOG_RETENTION_HOURS
value: "168" # 7 days
- name: KAFKA_AUTO_CREATE_TOPICS_ENABLE
value: "false"
resources:
limits:
memory: 512Mi
cpu: "2"
volumeClaimTemplates:
- metadata:
name: kafka-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 200Gi

HashiCorp Vault — HA Raft Mode

In production, Vault runs in HA mode with integrated Raft storage (3 nodes). In dev, Vault runs as a single node with server -dev.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: vault
namespace: platform-infra
spec:
serviceName: vault
replicas: 3
template:
spec:
containers:
- name: vault
image: hashicorp/vault:1.19
args: ["server"]
env:
- name: VAULT_LOCAL_CONFIG
value: |
ui = true
cluster_addr = "https://$(POD_IP):8201"
api_addr = "https://$(POD_IP):8200"
storage "raft" {
path = "/vault/data"
node_id = "$(POD_NAME)"
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = false
tls_cert_file = "/vault/tls/tls.crt"
tls_key_file = "/vault/tls/tls.key"
}
securityContext:
capabilities:
add: ["IPC_LOCK"] # Required for Vault memory locking
ports:
- containerPort: 8200
- containerPort: 8201 # Cluster/raft

Vault roles used by Platform-Kernel:

RolePolicies
platform-iamread jwt-signing-keys, write token-store
platform-domain-resolverread tls-certs, write acme-challenges
platform-servicesread db-creds (dynamic secrets)

JWT signing keys (ES256 P-256) are stored in Vault KV v2 and rotated every 90 days using Vault's credential rotation with the dual-key strategy (old key remains valid during rotation window).


Istio Service Mesh

Istio 1.29.2 enforces mutual TLS (mTLS) between all services in the platform-services namespace:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: platform-services
spec:
mtls:
mode: STRICT
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: platform-services-mtls
namespace: platform-services
spec:
host: "*.platform-services.svc.cluster.local"
trafficPolicy:
tls:
mode: ISTIO_MUTUAL

This replaces the services/shared/mtls Go package's self-managed certificate handling in production. In development (Docker Compose), shared/mtls handles mTLS without a service mesh.


Horizontal Pod Autoscaling

CPU-based HPA for stateless Go services:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: gateway-hpa
namespace: platform-services
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: gateway
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75

Apply the same pattern to iam, data-layer, money — the services under highest load.


Zero-Downtime Deployment

Platform-Kernel's zero-downtime strategy uses:

  1. maxUnavailable: 0 in all Deployment rolling update strategies
  2. Kafka cooperative sticky rebalancing — consumers continue processing during rolling restarts without full partition rebalance
  3. Kafka static group membership (group.instance.id = $POD_NAME) — prevents unnecessary consumer group rebalances when pods restart
  4. Readiness gatingkubectl rollout status waits for all pods to pass /health/ready before marking the rollout complete
# Deploy a new image version:
kubectl set image deployment/iam \
iam=ghcr.io/<org>/platform-kernel/iam:<new-sha> \
-n platform-services

# Monitor rollout:
kubectl rollout status deployment/iam -n platform-services
# → deployment "iam" successfully rolled out

# Rollback if needed:
kubectl rollout undo deployment/iam -n platform-services

Secret Management

All secrets are stored in Vault and injected via the Vault Agent sidecar or Vault Secrets Operator (VSO):

# Kubernetes Secret referencing Vault (via VSO):
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: platform-jwt-keys
namespace: platform-services
spec:
type: kv-v2
mount: secret
path: platform/iam/jwt-keys
destination:
name: jwt-keys
create: true
refreshAfter: 1h

Do not store JWT_PRIVATE_KEY or JWT_PUBLIC_KEY in plain Kubernetes Secrets without encryption at rest.


Node Affinity

Separate stateful and stateless containers to prevent resource contention:

# For Go service Deployments:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values: ["stateless"]

# For StatefulSets (PG, CH, Kafka):
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values: ["stateful"]

Label your nodes accordingly:

kubectl label node <worker-1> node-type=stateless
kubectl label node <worker-2> node-type=stateless
kubectl label node <storage-1> node-type=stateful
kubectl label node <storage-2> node-type=stateful

See Also