Testing Strategy
Platform Kernel enforces an 8-level testing pyramid. Each level has a
distinct frequency, toolchain, and acceptance threshold. No level can be
skipped: the CI Gate (gate job in ci.yml) blocks merge until all required
levels are green.
Pyramid Overview
Level 1 — Unit Tests
Go Services
# Run across all 12 services (as in ci.yml)
for svc in services/audit services/billing services/data-layer \
services/domain-resolver services/event-bus services/files \
services/gateway services/iam services/integration-hub \
services/module-registry services/money services/notify; do
cd "$svc" && go test ./... -race -count=1 \
-coverprofile=/tmp/cover-$(basename "$svc").out
done
| Flag | Purpose |
|---|---|
-race | Detect data races (Go race detector) |
-count=1 | Disable test result cache — always rerun |
-coverprofile | Per-service coverage artifact (uploaded to GitHub Actions) |
Coverage target: ≥ 80% per service (enforced by CI).
TypeScript Packages
pnpm vitest run --coverage \
--exclude '**/e2e/**' \
--exclude '**/*.browser.test.tsx'
Browser-mode tests (*.browser.test.tsx) require Playwright Chromium and run
in a separate step:
pnpm --filter @platform/sdk-ui exec playwright install chromium --with-deps
Logger Pattern
All Go services use log/slog (stdlib) with JSON handler in production:
// Pattern used across all services (services/iam, services/vault, etc.)
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: slog.LevelInfo,
}))
slog.SetDefault(logger)
Tests use a discarding no-op logger to suppress output:
func noopLogger() *slog.Logger {
return slog.New(slog.NewTextHandler(io.Discard, nil))
}
Level 2 — Integration Tests
Integration tests run against real infrastructure via Testcontainers. No mocks for databases, message brokers, or secret stores.
Infrastructure Stack
Tag Convention
All integration tests are gated by a build tag to prevent accidental execution during unit test runs:
//go:build integration
package integration_test
Run command: go test -tags=integration ./tests/integration/... -timeout=120s
IAM Integration Test Structure
The IAM service (services/iam/tests/integration/) is the reference
implementation for integration test patterns across all services:
| Test file | What it tests |
|---|---|
user_service_test.go | User CRUD, soft delete, restore — real PostgreSQL |
oauthapp_service_test.go | OAuth app lifecycle, token rotation |
handler_routes_test.go | HTTP handler routes with full middleware stack |
block44_final_verification_test.go | Tenant lifecycle state machine, idempotency |
Connection Configuration
// Injected via env in CI (ci.yml)
TEST_POSTGRES_URL=postgres://kernel:kernel_dev_password@localhost:5432/platform_kernel
TEST_KAFKA_BROKERS=localhost:9092
TEST_VALKEY_ADDR=localhost:6379
Level 3 — Contract Tests (Pact)
Pact v4 (Go SDK) verifies that the Gateway (consumer) and downstream services (providers) agree on the same API contract, independent of integration tests.
Consumer–Provider Map
CI Flow
Contract Location
services/gateway/tests/contract/pacts/
└── Gateway-IAM.json
└── Gateway-Billing.json
└── Gateway-DataLayer.json
Contracts are uploaded as GitHub Actions artifacts (90-day retention) after each successful consumer test run.
Pact Broker
Self-hosted Pact Broker runs via docker-compose.pact.yml:
# docker/docker-compose.pact.yml (excerpt)
pact-broker:
image: pactfoundation/pact-broker:latest
ports:
- "9292:9292"
environment:
PACT_BROKER_DATABASE_URL: "postgres://pact:pact@postgres:5432/pact"
Access: http://localhost:9292 · credentials: pact / pact_dev
Level 4 — Load Tests (k6)
Load tests run only on push to main via a self-hosted VDS runner
tagged [self-hosted, linux, load-test].
Acceptance Criteria
| Metric | Target |
|---|---|
| Sustained RPS | 10,000 |
| p99 latency | < 200 ms |
| Error rate | 0% |
| Traffic mix | 80% GET / 20% POST |
Script Location
scripts/load-test/gateway-load.js
k6 Invocation (ci.yml)
k6 run \
--env BASE_URL="$BASE_URL" \
--env JWT_TOKEN="$STAGING_JWT" \
--out json=k6-results.json \
scripts/load-test/gateway-load.js
Results are uploaded as GitHub Actions artifacts (90-day retention) for historical trend analysis.
Level 5 — Chaos Engineering (Litmus)
Chaos tests run monthly against the staging environment.
| Experiment | Target | Expected outcome |
|---|---|---|
| Pod kill (IAM) | IAM pod | Gateway circuit breaker opens, 503 < 30s, auto-recovery |
| Network partition | Kafka broker | Consumer group rebalances (Cooperative Sticky), no loss |
| Disk full (ClickHouse) | ClickHouse data volume | CDC pipeline pauses, WAL slot bloat alert fires |
| Vault seal | Vault | Services enter degraded mode (cached DEKs), alert fires |
| PostgreSQL kill | Primary PG pod | Data Layer 503, Readiness probe fails → pod exits LB |
Litmus Workflow
Level 6 — Fuzz Tests (AFL)
Fuzz targets run monthly on the CI self-hosted runner.
| Target | Input |
|---|---|
| Protobuf parser | Malformed proto binary blobs |
| JSON deserializer | Malformed JSON payloads for all gRPC requests |
| JWT parser | Malformed JWT tokens (header + payload + signature) |
| OpenAPI validator | Malformed HTTP request bodies |
Go stdlib fuzzing (go test -fuzz) is used for Go targets. AFL is used for
boundary testing of C libraries (libvips in files service).
Level 7 — Security Tests
SAST / SCA — Every PR
| Tool | What it scans |
|---|---|
| Snyk | Go dependency CVEs (SCA), Docker image CVEs |
| Semgrep | Go source code patterns (SAST), hardcoded secrets |
| golangci-lint | gosec linter — SQL injection, path traversal, weak crypto |
Defined in .github/workflows/security-scan.yml.
Gate: 0 Critical, 0 High vulnerabilities required for merge to main.
DAST — Quarterly
| Tool | Target |
|---|---|
| OWASP ZAP | API Gateway (Envoy) — active scan against staging |
| Burp Suite | Manual penetration test (quarterly, by security team) |
OWASP ZAP is run in API scan mode against the Staging URL with the staging JWT.
Level 8 — Smoke Tests
Smoke tests run:
- Every PR in CI (after Docker Build) —
scripts/smoke-test.sh - Every 1 minute in production — Kubernetes
startupProbe+ external uptime monitor
Health Endpoint Contract
Defined in services/iam/api/openapi.yaml and implemented identically across
all 12 services:
| Endpoint | Probe type | Returns |
|---|---|---|
GET /health/live | Kubernetes livenessProbe | 200 {"status":"alive"} always if process alive |
GET /health/ready | Kubernetes readinessProbe | 200 {"status":"ready"} if PG up; 503 if not |
GET /health | Full report | JSON with status, service, version, uptime_seconds, checks |
smoke-test.sh Target List
The smoke test iterates over all 12 service /health/live endpoints to verify
every container started successfully after a Docker build:
#!/usr/bin/env bash
SERVICES=(iam gateway data-layer event-bus notify files money
audit module-registry billing integration-hub domain-resolver)
for svc in "${SERVICES[@]}"; do
curl -sf "http://${svc}:8080/health/live" \
|| { echo "❌ ${svc} health check failed"; exit 1; }
done
echo "✅ All smoke tests passed"
CI Pipeline — Complete Sequence
See Also
- Deployment — CI/CD pipeline, Docker Compose, K8s rolling
- Observability — metrics, tracing, health endpoints
- Security Deep Dive — encryption, key rotation