Skip to main content

Retry & Failed Notifications

When a notification channel adapter fails to deliver (provider API timeout, SendGrid down, Telegram rate limit), the Notify Service retries automatically using exponential backoff via RabbitMQ. The calling module is never blocked — send() returns 202 Accepted immediately and retries happen asynchronously.


Retry Schedule

AttemptDelay (before retry)Delay with ±10% jitter
1 (first try)— (immediate from queue)
230 s27–33 s
360 s54–66 s
4120 s108–132 s
5240 s216–264 s
Max retry (6th attempt)480 sNot executed — marked failed

Total elapsed time from first attempt to failed status: 30 + 60 + 120 + 240 + 480 = 930 seconds (~15.5 minutes).

Jitter (±10%) is applied to each delay to prevent multiple failing notifications from retrying simultaneously and amplifying provider load.


Per-Attempt Timeout

Each delivery attempt has a 10-second timeout (NOTIFY_ADAPTER_TIMEOUT_SEC). If the adapter's send() method does not return within 10 seconds, the attempt is counted as failed and the next retry is scheduled.

This prevents a slow external API from holding a RabbitMQ worker indefinitely and blocking other notifications in the queue.

Attempt 1: send() called → timeout after 10s → retry 2 in ~30s
Attempt 2: send() called → timeout after 10s → retry 3 in ~60s

Attempt 5: send() called → success ✅ → status: delivered

Failed Status

After 5 failed attempts, the notification enters the failed state:

After attempt 5 fails:
→ PostgreSQL: notifications_failed table
→ status: "failed"
→ failedAt: <timestamp>
→ reason: "max retries exceeded" | "adapter_error: <message>"
→ visible in Admin → Notifications → Failed (retained 180 days)

The failed status is terminal — the Notify Service does not retry automatically beyond 5 attempts. Only a manual retry via the API restarts delivery.


Monitoring Failed Notifications

GET https://api.septemcore.com/v1/notifications/failed
Authorization: Bearer <access_token>
{
"data": [
{
"notificationId": "01j9panot700000000000000",
"channel": "email",
"userId": "01j9pa5mz700000000000000",
"status": "failed",
"reason": "SendGrid API returned 503",
"attempts": 5,
"failedAt": "2026-04-15T10:45:30.000Z"
}
],
"pagination": {
"nextCursor": null,
"hasMore": false
}
}

Pagination is cursor-based (same as all list endpoints). Failed notifications are retained for 180 days.


Manual Retry

Any failed notification can be re-queued manually:

POST https://api.septemcore.com/v1/notifications/01j9panot700000000000000/retry
Authorization: Bearer <access_token>

Response 202 Accepted:

{
"notificationId": "01j9panot700000000000000",
"status": "queued",
"attempt": 6
}

Manual retry is not subject to the 5-attempt limit — it always re-enqueues regardless of previous attempts. The attempt counter continues incrementing for audit purposes.

SDK equivalent:

await kernel.notify().retry('01j9panot700000000000000');
// { notificationId: '...', status: 'queued', attempt: 6 }

No Automatic Channel Fallback

The Notify Service does not automatically switch channels when one is unavailable:

Email channel is down.
Module called: send({ channel: 'email', userId: '...' })
Notify retries email 5 times → failed.

Notify does NOT automatically switch to SMS.
Tenant chose email as the delivery channel — that choice is respected.

Why no fallback: Automatic fallback would deliver notifications via channels the user did not configure or expect (SMS when email is preferred). This creates compliance issues (unsubscribe preferences, GDPR consent) and poor UX (unexpected SMS from an app you use for email).

If a module needs multi-channel resilience, it should send to multiple channels explicitly:

// Send to both email and SMS for critical OTP codes
await Promise.all([
kernel.notify().send({ userId, channel: 'email', body, priority: 'critical' }),
kernel.notify().send({ userId, channel: 'sms', body, priority: 'critical' }),
]);

Each send tracks retries independently.


Environment Variables

VariableDefaultDescription
NOTIFY_MAX_RETRIES5Max delivery attempts per notification
NOTIFY_ADAPTER_TIMEOUT_SEC10Timeout per delivery attempt
NOTIFY_BACKOFF_BASE_SEC30Base delay for exponential backoff
NOTIFY_BACKOFF_JITTER_PERCENT10Jitter ±% applied to each delay

Error Reference

ScenarioHTTPCode
Notification already delivered409NOTIFICATION_ALREADY_DELIVERED
Notification not found404not-found
Notification in queued state (retry not needed)409NOTIFICATION_NOT_FAILED