Retry & Failed Notifications
When a notification channel adapter fails to deliver (provider API
timeout, SendGrid down, Telegram rate limit), the Notify Service
retries automatically using exponential backoff via RabbitMQ.
The calling module is never blocked — send() returns 202 Accepted
immediately and retries happen asynchronously.
Retry Schedule
| Attempt | Delay (before retry) | Delay with ±10% jitter |
|---|---|---|
| 1 (first try) | — | — (immediate from queue) |
| 2 | 30 s | 27–33 s |
| 3 | 60 s | 54–66 s |
| 4 | 120 s | 108–132 s |
| 5 | 240 s | 216–264 s |
| Max retry (6th attempt) | 480 s | Not executed — marked failed |
Total elapsed time from first attempt to failed status:
30 + 60 + 120 + 240 + 480 = 930 seconds (~15.5 minutes).
Jitter (±10%) is applied to each delay to prevent multiple failing notifications from retrying simultaneously and amplifying provider load.
Per-Attempt Timeout
Each delivery attempt has a 10-second timeout (NOTIFY_ADAPTER_TIMEOUT_SEC).
If the adapter's send() method does not return within 10 seconds, the
attempt is counted as failed and the next retry is scheduled.
This prevents a slow external API from holding a RabbitMQ worker indefinitely and blocking other notifications in the queue.
Attempt 1: send() called → timeout after 10s → retry 2 in ~30s
Attempt 2: send() called → timeout after 10s → retry 3 in ~60s
…
Attempt 5: send() called → success ✅ → status: delivered
Failed Status
After 5 failed attempts, the notification enters the failed state:
After attempt 5 fails:
→ PostgreSQL: notifications_failed table
→ status: "failed"
→ failedAt: <timestamp>
→ reason: "max retries exceeded" | "adapter_error: <message>"
→ visible in Admin → Notifications → Failed (retained 180 days)
The failed status is terminal — the Notify Service does not retry
automatically beyond 5 attempts. Only a manual retry via the API
restarts delivery.
Monitoring Failed Notifications
GET https://api.septemcore.com/v1/notifications/failed
Authorization: Bearer <access_token>
{
"data": [
{
"notificationId": "01j9panot700000000000000",
"channel": "email",
"userId": "01j9pa5mz700000000000000",
"status": "failed",
"reason": "SendGrid API returned 503",
"attempts": 5,
"failedAt": "2026-04-15T10:45:30.000Z"
}
],
"pagination": {
"nextCursor": null,
"hasMore": false
}
}
Pagination is cursor-based (same as all list endpoints). Failed notifications are retained for 180 days.
Manual Retry
Any failed notification can be re-queued manually:
POST https://api.septemcore.com/v1/notifications/01j9panot700000000000000/retry
Authorization: Bearer <access_token>
Response 202 Accepted:
{
"notificationId": "01j9panot700000000000000",
"status": "queued",
"attempt": 6
}
Manual retry is not subject to the 5-attempt limit — it always
re-enqueues regardless of previous attempts. The attempt counter
continues incrementing for audit purposes.
SDK equivalent:
await kernel.notify().retry('01j9panot700000000000000');
// { notificationId: '...', status: 'queued', attempt: 6 }
No Automatic Channel Fallback
The Notify Service does not automatically switch channels when one is unavailable:
Email channel is down.
Module called: send({ channel: 'email', userId: '...' })
Notify retries email 5 times → failed.
Notify does NOT automatically switch to SMS.
Tenant chose email as the delivery channel — that choice is respected.
Why no fallback: Automatic fallback would deliver notifications via channels the user did not configure or expect (SMS when email is preferred). This creates compliance issues (unsubscribe preferences, GDPR consent) and poor UX (unexpected SMS from an app you use for email).
If a module needs multi-channel resilience, it should send to multiple channels explicitly:
// Send to both email and SMS for critical OTP codes
await Promise.all([
kernel.notify().send({ userId, channel: 'email', body, priority: 'critical' }),
kernel.notify().send({ userId, channel: 'sms', body, priority: 'critical' }),
]);
Each send tracks retries independently.
Environment Variables
| Variable | Default | Description |
|---|---|---|
NOTIFY_MAX_RETRIES | 5 | Max delivery attempts per notification |
NOTIFY_ADAPTER_TIMEOUT_SEC | 10 | Timeout per delivery attempt |
NOTIFY_BACKOFF_BASE_SEC | 30 | Base delay for exponential backoff |
NOTIFY_BACKOFF_JITTER_PERCENT | 10 | Jitter ±% applied to each delay |
Error Reference
| Scenario | HTTP | Code |
|---|---|---|
| Notification already delivered | 409 | NOTIFICATION_ALREADY_DELIVERED |
| Notification not found | 404 | not-found |
Notification in queued state (retry not needed) | 409 | NOTIFICATION_NOT_FAILED |