Skip to main content

WebSocket Protocol

The Notify Service exposes a WebSocket endpoint for real-time browser notifications. This is the only platform endpoint that bypasses the API Gateway — persistent WebSocket connections are inefficient to proxy, so clients connect directly to the Notify Service. Authorization still goes through JWT validation on the Notify Service side.


Connection Endpoint

wss://notify.platform.io/ws

Single connection per session. Open one WebSocket connection when the user logs in and reuse it for all real-time notifications. Do not open connections per component or per page.


Connection Lifecycle

Step 1 — Open WebSocket

const ws = new WebSocket('wss://notify.platform.io/ws');

Step 2 — Authenticate (first message)

The very first message after connection must be the auth message. The server waits up to 5 seconds for authentication — if none arrives, it closes the connection with code 4401.

{
"type": "auth",
"token": "<JWT access token>"
}

The Notify Service validates the JWT: checks signature, expiry, and extracts tenantId and userId. All subsequent events on this connection are scoped to that tenant.

Invalid token:

{ "type": "error", "code": "4401", "message": "Authentication failed" }

Connection closed immediately with WebSocket close code 4401.

Step 3 — Auth Confirmation

{
"type": "auth_ok",
"userId": "01j9pa5mz700000000000000",
"tenantId": "01j9p3kz5f00000000000000"
}

Step 4 — Subscribe to Channels

After authentication, subscribe to one or more named channels:

{ "type": "subscribe", "channel": "dashboard.metrics" }

The server acknowledges:

{ "type": "subscribe_ok", "channel": "dashboard.metrics" }

Multiple subscriptions are sent as separate messages. A connection supports up to 50 active channel subscriptions. The server stores subscriptions as a SET — a duplicate subscribe for the same channel is a no-op (no error, no duplicate delivery).

Unsubscribe

{ "type": "unsubscribe", "channel": "dashboard.metrics" }

Message Format

All WebSocket frames are JSON. Incoming notification messages:

{
"type": "notification",
"id": "01j9panot700000000000000",
"channel": "dashboard.metrics",
"payload": {
"metric": "active_users",
"value": 1423,
"delta": "+12"
},
"timestamp": "2026-04-15T10:30:00.000Z"
}
FieldRequiredDescription
typeOne of: auth, auth_ok, subscribe, subscribe_ok, unsubscribe, notification, ping, pong, error
idUUID — used for reconnect replay deduplication
channelPresent on notification, subscribe, unsubscribe, subscribe_ok
payloadNotification data (arbitrary JSON object)
timestampISO 8601 UTC

Heartbeat

The server sends a ping every 30 seconds. The client must respond with a pong within 10 seconds:

← Server: { "type": "ping" }
→ Client: { "type": "pong" }

If the client misses 2 consecutive pongs, the server closes the connection with WebSocket close code 4408 (custom — timeout).

t=0s Server sends ping #1
t=10s No pong received — 1 miss
t=30s Server sends ping #2
t=40s No pong received — 2 misses → close 4408

Implement an auto-pong handler and never block the WebSocket message loop.


Reconnect with Exponential Backoff + Jitter

When the connection drops (server close, network interruption, or 4408 timeout), reconnect using exponential backoff with jitter:

reconnectDelay = baseDelay × 2^attempt + random(0, baseDelay)

attempt 0: 1 × 2^0 + random(0, 1) = 1–2 s
attempt 1: 1 × 2^1 + random(0, 1) = 2–3 s
attempt 2: 1 × 2^2 + random(0, 1) = 4–5 s
attempt 3: 1 × 2^3 + random(0, 1) = 8–9 s
attempt 4: 1 × 2^4 + random(0, 1) = 16–17 s
attempt 5+: capped at 30 s

Jitter prevents thundering herd: when a Notify Service deploy restarts all connections simultaneously, randomised backoff spreads reconnects over several seconds instead of hitting the server with a spike.

function reconnectDelay(attempt: number): number {
const base = 1000; // ms
const max = 30_000;
const delay = base * Math.pow(2, attempt) + Math.random() * base;
return Math.min(delay, max);
}

Token Refresh on Reconnect

When the JWT expires during a long session, the server sends:

{ "type": "error", "code": "TOKEN_EXPIRED", "message": "Access token expired" }

The client must:

  1. Call POST https://api.septemcore.com/v1/auth/refresh with the refresh token to get a new access token.
  2. Reconnect the WebSocket.
  3. Send the new token in the auth message.

This flow keeps the session alive without requiring a page reload.


Reconnect Replay Buffer

On reconnect, the client sends lastMessageId to recover missed messages:

{
"type": "auth",
"token": "<new JWT>",
"lastMessageId": "01j9panot700000000000000"
}

The Notify Service reads the Valkey-backed replay buffer:

LRANGE ws:replay:{tenantId}:{channel} 0 99
ParameterValue
Buffer size100 messages per channel per tenant
TTL1 hour (WS_REPLAY_BUFFER_TTL_SEC=3600)
Env overrideWS_REPLAY_BUFFER_SIZE=100
PersistenceValkey AOF — buffer survives Notify Service restart
If lastMessageId older than bufferClient receives only new events (best-effort)

Replay is an optimisation, not a guarantee. For guaranteed message recovery use the REST fallback:

GET https://api.septemcore.com/v1/notifications?since=2026-04-15T10:29:00Z
Authorization: Bearer <access_token>

This queries PostgreSQL — the authoritative notification history store.


Channel Namespacing (Tenant Isolation)

All WebSocket channels are implicitly namespaced by tenantId:

Client subscribes to: "dashboard.metrics"
Server internal key: "{tenantId}:dashboard.metrics"
e.g. "01j9p3kz5f00000000000000:dashboard.metrics"

The SDK prefixes tenantId automatically from the JWT. A client in tenant A can never receive broadcasts intended for tenant B, even if both subscribe to the same channel name.


Connection Limits

ParameterValue
Max connections per tenant1 000 (NOTIFY_WS_MAX_CONNECTIONS_PER_TENANT)
Max message size64 KB
Max subscriptions per connection50 channels
Rate limit per tenant200 messages/sec (NOTIFY_WS_RATE_PER_TENANT)
Rate limit exceededThrottle + warning message (connection stays open)

Close Codes Reference

CodeMeaningClient action
4401JWT invalid or absent at auth stepRefresh token, reconnect
44082 consecutive pong timeoutsReconnect with backoff
1000Normal server-initiated close (deploy, graceful shutdown)Reconnect with backoff
1006Abnormal close (network drop)Reconnect with backoff