Vault Setup
Platform-Kernel integrates with HashiCorp Vault via the services/vault/
package, which implements the SecretProvider interface. There is no
fallback to environment variables for secrets (Architectural Decision
AD-1): if Vault is unreachable at startup, the service aborts after
VAULT_INIT_TIMEOUT_SEC seconds (120 by default).
Project-pinned version: Vault 1.19 (docker/versions.env).
Vault 2.0.0 note: Vault 2.0.0 was released April 2026 and introduces breaking changes in the Agent/SPIFFE integrations. The project remains on 1.19 until a migration guide is published. Do not upgrade without reviewing the Vault 2.0 changelog against
services/vault/vault_provider.go.
Architecture
Development — Dev Mode
In development, Vault runs as a single node in dev mode (no persistence, auto-unsealed, root token):
# docker/docker-compose.yml (excerpt)
vault:
image: hashicorp/vault:1.19
command: "server -dev"
environment:
VAULT_DEV_ROOT_TOKEN_ID: kernel-dev-root-token
VAULT_DEV_LISTEN_ADDRESS: "0.0.0.0:8200"
ports:
- "8200:8200"
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://127.0.0.1:8200/v1/sys/health"]
interval: 10s
timeout: 5s
retries: 5
The dev root token (kernel-dev-root-token) is injected directly into
all Go services via VAULT_TOKEN. Dev-mode JWT keys are pre-seeded
base64-encoded ES256 key pairs in docker-compose.yml.
Start Vault (dev):
docker compose \
--env-file docker/versions.env \
-f docker/docker-compose.yml \
up -d vault --wait
# Verify:
curl -s http://localhost:8200/v1/sys/health | python3 -m json.tool
# → {"initialized":true,"sealed":false,"standby":false,...}
Production — Raft HA Mode
Production uses three-node Raft integrated storage (no Consul
required). Each node is a Vault StatefulSet pod in the
platform-infra namespace.
HCL Configuration
# /etc/vault/config.hcl (mounted as a ConfigMap)
ui = true
cluster_addr = "https://NODE_IP:8201"
api_addr = "https://NODE_IP:8200"
storage "raft" {
path = "/vault/data"
node_id = "POD_NAME" # injected via Downward API env var
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = false
tls_cert_file = "/vault/tls/tls.crt"
tls_key_file = "/vault/tls/tls.key"
}
seal "awskms" { # or "gcpckms" / "azurekeyvault"
region = "eu-central-1"
kms_key_id = "arn:aws:kms:..."
}
Initialisation
Run once per cluster on fresh install:
export VAULT_ADDR=https://vault-0.vault.platform-infra.svc.cluster.local:8200
# Initialise (5 key shares, 3 threshold)
vault operator init \
-key-shares=5 \
-key-threshold=3 \
-format=json > /secure/vault-init.json
# Store unseal keys in separate HSM / secret manager — NEVER in git.
# Unseal vault-0
vault operator unseal <key-1>
vault operator unseal <key-2>
vault operator unseal <key-3>
# Join vault-1 and vault-2 to the Raft cluster
VAULT_ADDR=https://vault-1.vault.platform-infra.svc.cluster.local:8200 \
vault operator raft join https://vault-0.vault.platform-infra.svc.cluster.local:8200
VAULT_ADDR=https://vault-2.vault.platform-infra.svc.cluster.local:8200 \
vault operator raft join https://vault-0.vault.platform-infra.svc.cluster.local:8200
Auto-Unseal (Production Requirement)
In production, Vault must be configured with auto-unsealing via a cloud KMS (AWS KMS, GCP Cloud KMS, or Azure Key Vault). Manual unseal from init keys is only for break-glass scenarios.
Secrets Engine Setup
Enable KV v2
vault secrets enable -path=secret kv-v2
All Platform-Kernel secrets are stored under secret/data/platform/.
Secret Path Convention
| Secret | Vault path | Reader |
|---|---|---|
| IAM JWT signing key (ES256) | secret/data/platform/iam/jwt-keys | IAM |
| IAM OIDC RSA key (RS256) | secret/data/platform/iam/oidc-rsa-key | IAM |
| MFA encryption key (AES-256) | secret/data/platform/iam/mfa-key | IAM |
| Domain TLS certificate | secret/data/platform/domains/{domain}/cert | Domain Resolver |
| DB credentials (dynamic) | database/creds/platform-db-role | All DB services |
JWT Signing Key Management
Key Generation (ES256 — ECDSA P-256)
# Generate private key
openssl ecparam -name prime256v1 -genkey -noout \
-out /tmp/iam-jwt-private.pem
# Extract public key
openssl ec -in /tmp/iam-jwt-private.pem \
-pubout -out /tmp/iam-jwt-public.pem
# Store in Vault KV v2
vault kv put secret/platform/iam/jwt-keys \
private_key="$(cat /tmp/iam-jwt-private.pem | base64)" \
public_key="$(cat /tmp/iam-jwt-public.pem | base64)"
# Destroy local copies immediately
shred -u /tmp/iam-jwt-private.pem /tmp/iam-jwt-public.pem
Dual-Key Rotation (Zero Downtime)
The SecretProvider.RotateSecret method implements atomic dual-key
rotation defined in services/vault/provider.go:
t=0 Write new key pair → Vault (version N+1)
t=0 Old key (version N) stays readable for gracePeriod
t+0 IAM WatchRotation callback fires → begin signing with N+1
t+0 Gateway receives new public key → validates both N and N+1
t+Δ gracePeriod expires → old key (N) invalidated
The grace period defaults to 24 hours
(OIDC_SECRET_ROTATION_GRACE_HOURS = 24). All in-flight tokens
signed with the old key remain valid for their exp lifetime.
Automated 90-Day Rotation
Schedule a CronJob (Kubernetes) or cron task (Docker Compose host) to rotate the JWT signing key every 90 days:
# kubernetes/cronjob-jwt-rotation.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: jwt-key-rotation
namespace: platform-infra
spec:
schedule: "0 2 * * 0" # Every Sunday 02:00 UTC (≈ every 13 weeks)
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: rotate
image: hashicorp/vault:1.19
command:
- /bin/sh
- -c
- |
# Generate new ES256 key pair
openssl ecparam -name prime256v1 -genkey -noout -out /tmp/key.pem
openssl ec -in /tmp/key.pem -pubout -out /tmp/pub.pem
# Atomic KV v2 write (creates new version)
vault kv put secret/platform/iam/jwt-keys \
private_key="$(cat /tmp/key.pem | base64)" \
public_key="$(cat /tmp/pub.pem | base64)"
shred -u /tmp/key.pem /tmp/pub.pem
env:
- name: VAULT_ADDR
value: http://vault.platform-infra.svc.cluster.local:8200
- name: VAULT_TOKEN
valueFrom:
secretKeyRef:
name: vault-rotation-token
key: token
IAM's WatchRotation callback polls Vault every
VAULT_WATCH_INTERVAL_SEC (30 seconds, default). After a new version
is written, IAM reloads the signing key without restart:
// services/vault/vault_provider.go — WatchRotation flow
// Poll interval: VAULT_WATCH_INTERVAL_SEC (default 30s)
// On version change: invoke all registered callbacks with newData
// IAM callback: reload JWT private key in-memory
AppRole Authentication (Production)
In production, services must NOT use the root token. Configure AppRole:
# Enable AppRole auth
vault auth enable approle
# Create IAM policy
vault policy write platform-iam - <<EOF
path "secret/data/platform/iam/*" {
capabilities = ["read"]
}
path "secret/metadata/platform/iam/*" {
capabilities = ["read", "list"]
}
EOF
# Create AppRole for IAM
vault write auth/approle/role/platform-iam \
token_policies="platform-iam" \
token_ttl=1h \
token_max_ttl=4h \
secret_id_ttl=0
# Get RoleID (store in ConfigMap)
vault read auth/approle/role/platform-iam/role-id
# Get SecretID (store in Kubernetes Secret)
vault write -f auth/approle/role/platform-iam/secret-id
Services authenticate at startup:
vault write auth/approle/login \
role_id=<ROLE_ID> \
secret_id=<SECRET_ID>
# → Returns a service token (TTL 1h, renewable)
Vault Policies per Service
| Service | Policy path | Capabilities |
|---|---|---|
platform-iam | secret/data/platform/iam/* | read |
platform-domain-resolver | secret/data/platform/domains/* | read, create, update, delete |
platform-services | database/creds/platform-db-role | read |
jwt-rotation-job | secret/data/platform/iam/jwt-keys | create, update |
TLS Certificate Storage
Domain Resolver stores per-domain TLS certificates (issued by
Certbot v4.0.0 via Let's Encrypt) in Vault:
# Store certificate after ACME issuance
vault kv put secret/platform/domains/example.com/cert \
fullchain="$(cat /etc/letsencrypt/live/example.com/fullchain.pem)" \
privkey="$(cat /etc/letsencrypt/live/example.com/privkey.pem)" \
expires_at="2026-07-22T00:00:00Z"
Domain Resolver renews certificates DOMAINS_SSL_RENEW_DAYS_BEFORE
(30 days) before expiry and writes new versions atomically. Old
versions are retained for the grace period.
Vault Environment Variables
| Variable | Default | Description |
|---|---|---|
VAULT_ADDR | http://vault:8200 | Vault HTTP/HTTPS address. |
VAULT_TOKEN | (required) | Root or AppRole service token. |
VAULT_INIT_TIMEOUT_SEC | 120 | Max seconds to wait for Vault at startup. |
VAULT_RETRY_INITIAL_MS | 250 | Initial exponential backoff (ms). |
VAULT_RETRY_MAX_MS | 16000 | Max exponential backoff (ms). |
VAULT_WATCH_INTERVAL_SEC | 30 | JWT key rotation poll interval (seconds). |
Health and Status
# Cluster status (Raft)
vault operator raft list-peers
# Seal status
curl -s $VAULT_ADDR/v1/sys/health | python3 -m json.tool
# initialized: true, sealed: false, standby: false
# List KV versions for a secret
vault kv metadata get secret/platform/iam/jwt-keys
# Read current JWT keys (requires policy)
vault kv get -format=json secret/platform/iam/jwt-keys
See Also
- Configuration Reference — all
VAULT_*environment variables - Requirements — Vault version matrix
- Kubernetes Deployment — Vault Secrets Operator (VSO) and StatefulSet configuration
- Architecture → Security Deep Dive — mTLS, AES-256 field encryption, envelope encryption