WebSocket Protocol Reference
The WebSocket API is the real-time delivery channel for the kernel.notify()
primitive. It operates on a persistent, authenticated connection and delivers
notification events pushed from the Notify Service over a tenant-isolated channel.
WebSocket is the only push channel — all other delivery mechanisms
(email, SMS, in-app push) are dispatched asynchronously via RabbitMQ. This page
documents the complete wire protocol as implemented in
services/notify/internal/ws/.
Connection URL
wss://api.septemcore.com/v1/ws
The endpoint is served by the Notify Service (NOTIFY_HTTP_PORT=8095)
behind the Envoy Gateway. The Gateway terminates TLS and forwards the upgrade
request with Upgrade: websocket intact.
| Parameter | Value |
|---|---|
| Protocol | WebSocket (RFC 6455) |
| Transport | TLS 1.3 (Envoy terminates) |
| Message encoding | JSON (UTF-8 text frames only) |
| Binary frames | Not supported — rejected silently |
| Sub-protocols | None |
| Max message size | 4096 bytes (MaxMessageSize = 4096) |
Connection Lifecycle
[Client] [Notify Service]
connect wss://api.septemcore.com/v1/ws
──────────────────────────────────────►
TCP + TLS handshake
WebSocket upgrade (101 Switching Protocols)
◄──────────────────────────────────────
◄── {"type":"auth_required"}
{"type":"auth","token":"<JWT>"}
─────────────────────────────►
JWT validation via IAM gRPC ValidateToken
◄── {"type":"auth_ok","connId":"ws-1745012345"}
{"type":"subscribe","channels":["notifications","alerts"]}
──────────────────────────────────────────────────────────►
◄── {"type":"subscribed"}
◄── {"type":"notification",...} (asynchronous)
{"type":"ping"}
───────────────►
◄── {"type":"pong"}
┌── server ping every 30s (WebSocket protocol Ping frame) ──┐
│ client must respond Pong within 10s │
│ 2 missed pongs → close(4408) │
└────────────────────────────────────────────────────────────┘
Handshake — Step by Step
| Step | Direction | Message Type | Notes |
|---|---|---|---|
| 1 | Server → Client | auth_required | Sent immediately on upgrade |
| 2 | Client → Server | auth | 10-second timeout from step 1. Includes <JWT> |
| 3 | Server → Client | auth_ok | JWT valid + tenantId/userId extracted. Includes connId |
| 3a | Server → Client | auth_error | Invalid token → connection closes (1008) |
| 4 | Client → Server | subscribe | Provides array of channels |
| 5 | Server → Client | subscribed | Subscription registered in Hub |
| 6 | Server → Client | notification | Asynchronous, pushed by Hub.Broadcast |
Auth timeout: If the client does not send
{"type":"auth"}within 10 seconds of receivingauth_required, the server closes the connection withStatusPolicyViolation(1008).
Message Reference
Client → Server Messages
auth
Authenticates the connection. Must be sent within 10 seconds of auth_required.
{
"type": "auth",
"token": "eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9..."
}
| Field | Type | Required | Description |
|---|---|---|---|
type | string | ✅ | Must be "auth" |
token | string | ✅ | JWT ES256 access token (without "Bearer " prefix) |
subscribe
Subscribes to one or more notification channels.
{
"type": "subscribe",
"channels": ["notifications", "alerts", "system"]
}
| Field | Type | Required | Description |
|---|---|---|---|
type | string | ✅ | Must be "subscribe" |
channels | string[] | ✅ | Channel names. Max 50 per connection. |
Channel namespace: Channel names are automatically namespaced by the server
with the authenticated tenantId:
channels: ["notifications"]
→ namespaced: "{tenantId}:notifications"
Clients never see the namespaced form — they use bare channel names. Cross-tenant
delivery is impossible at the Hub level (Hub broadcasts only to tenantID-matching
connections).
ping
Client-initiated heartbeat. Server responds with pong. Optional — the
server also sends its own WebSocket protocol-level pings.
{"type": "ping"}
replay
Requests missed messages since the last known message ID. Used on reconnect to recover messages delivered while the connection was down.
{
"type": "replay",
"lastMessageId": "msg-1745012300000000001"
}
| Field | Type | Required | Description |
|---|---|---|---|
type | string | ✅ | Must be "replay" |
lastMessageId | string | ✅ | ID of the last message received before disconnect |
Replay is backed by Valkey (ws:replay:{tenantId}:{channel} key).
The server replays all messages stored after lastMessageId for all channels
of this tenant. Replay buffer is capped at 100 messages per channel (LIFO).
Server → Client Messages
auth_required
Sent immediately upon successful WebSocket upgrade. Signals that the connection
is unauthenticated and the client must send auth.
{"type": "auth_required"}
auth_ok
Sent after successful JWT validation. Includes the server-assigned connection ID.
{
"type": "auth_ok",
"connId": "ws-1745012345678901234"
}
| Field | Type | Description |
|---|---|---|
connId | string | Unique connection identifier. Use in logs and support tickets. |
auth_error
Sent when JWT validation fails. Connection is closed immediately after.
{
"type": "auth_error",
"error": "invalid token"
}
subscribed
Confirms that the channel subscription was registered.
{"type": "subscribed"}
notification
Push message delivered to subscribed channels. The payload field is
channel-specific and defined by the originating service.
{
"id": "msg-1745012345678901234",
"channel": "notifications",
"type": "notification",
"payload": {
"title": "Payment received",
"body": "Your invoice #INV-2026-042 has been paid.",
"severity": "info",
"action_url": "https://api.septemcore.com/v1/billing/invoices/INV-2026-042"
},
"timestamp": "2026-04-22T10:56:00Z"
}
| Field | Type | Description |
|---|---|---|
id | string | Unique message ID. Used in replay.lastMessageId. |
channel | string | Bare channel name (without tenant prefix). |
type | string | Always "notification" for pushed messages. |
payload | object | Notification data. Shape defined per channel. |
timestamp | string | ISO 8601 UTC. Server-set delivery time. |
pong
Response to a client ping.
{"type": "pong"}
replay_error
Sent when replay is unavailable (replay store not configured or internal error).
{
"type": "replay_error",
"error": "replay not available"
}
Heartbeat — Keepalive
The server sends a WebSocket protocol-level Ping frame every 30 seconds
(PingInterval = 30s). The client must respond with a Pong frame within
10 seconds (PongTimeout = 10s).
| Parameter | Value | Source |
|---|---|---|
| Ping interval | 30 seconds | ws.PingInterval = 30 * time.Second |
| Pong timeout | 10 seconds | ws.PongTimeout = 10 * time.Second |
| Max missed pongs | 2 | ws.MaxMissedPongs = 2 |
| Close code on timeout | 4408 | ws.CloseCodeMissedPong = 4408 |
After MaxMissedPongs consecutive missed pongs, the server closes the connection
with close code 4408. This is a custom application-level code in the 4000–4999
range (reserved for application use per RFC 6455).
The 2-missed-pong threshold means the effective heartbeat timeout is
PingInterval + MaxMissedPongs × PongTimeout = 30 + 2 × 10 = 50 seconds.
Reconnect and Replay
Recommended reconnect strategy:
- Client detects connection close (any code).
- Wait
min(2^attempt × 1s, 60s)(exponential backoff with 60s cap). - Re-establish connection: full auth handshake → subscribe → send
replaywith the last knownmessage.id. - Server replays missing messages from Valkey buffer.
{
"type": "replay",
"lastMessageId": "msg-1745009999000000001"
}
Replay limitations:
| Parameter | Value |
|---|---|
| Replay buffer depth | 100 messages per channel (LIFO — oldest dropped first) |
| Replay storage | Valkey LPUSH / LRANGE on ws:replay:{tenantId}:{channel} |
| Messages older than buffer | Not replayed — client must poll REST API |
| REST fallback | GET https://api.septemcore.com/v1/notifications?since=<ISO> |
Close Codes
| Code | Meaning | Who closes |
|---|---|---|
1000 | Normal closure | Either side |
1008 | Policy violation — auth timeout, invalid token | Server |
4408 | Heartbeat timeout (2 missed pongs) | Server |
Standard WebSocket close codes 1001–1007 may appear from Envoy (network
errors, protocol violations, etc.) and should be treated as transient — reconnect.
Tenant Isolation
All connections, channels, and message dispatch are fully tenant-isolated at the Hub level:
| Mechanism | Implementation |
|---|---|
| Connection scope | Each Conn carries TenantID extracted from JWT |
| Channel namespace | channel → {tenantID}:{channel} before Hub registration |
| Broadcast routing | Hub.Broadcast(tenantID, channel, data) only delivers to connections matching tenantID |
| Cross-tenant delivery | Architecturally impossible — Hub maps are keyed by tenantID |
Limits
| Limit | Value | Config env variable |
|---|---|---|
| Max connections per tenant | 1000 | NOTIFY_WS_MAX_CONNECTIONS_PER_TENANT=1000 |
| Max broadcast rate per tenant | 200 msg/sec | NOTIFY_WS_RATE_PER_TENANT=200 |
| Max message size (incoming) | 4096 bytes | ws.MaxMessageSize = 4096 |
| Per-connection send buffer | 256 messages | ws.SendBufferSize = 256 |
| Auth timeout | 10 seconds | Hardcoded in handler |
| Max channels per connection | 50 | Enforced at subscribe |
| Replay buffer depth | 100 messages / channel | ReplayStore.Push cap |
When
maxConnsPerTenantis reached, the new connection is closed immediately after registration — no error message is sent. The client will see a clean close and should apply exponential backoff before reconnecting.
Implementation Notes
| Component | Path |
|---|---|
| WebSocket handler | services/notify/internal/ws/handler.go |
| Hub (connection manager) | services/notify/internal/ws/hub.go |
| Protocol constants | ws.PingInterval, ws.PongTimeout, ws.MaxMissedPongs, ws.CloseCodeMissedPong |
| Config | services/notify/internal/config/config.go |
| Integration tests | services/notify/tests/integration/websocket_test.go |
| WebSocket library | github.com/coder/websocket (nhooyr/websocket fork, maintained April 2026) |