Retry & Failed Notifications

5 retries with exponential backoff 30→480s, jitter ±10%. Timeout 10s per attempt. After max retries notification enters failed state visible in Admin UI. Manual retry via REST. No automatic channel fallback by design.

When a notification channel adapter fails to deliver (provider API timeout, SendGrid down, Telegram rate limit), the Notify Service retries automatically using exponential backoff via RabbitMQ. The calling module is never blocked — send() returns 202 Accepted immediately and retries happen asynchronously.

Retry Schedule

Attempt	Delay (before retry)	Delay with ±10% jitter
1 (first try)	—	— (immediate from queue)
2	30 s	27–33 s
3	60 s	54–66 s
4	120 s	108–132 s
5	240 s	216–264 s
Max retry (6th attempt)	480 s	Not executed — marked `failed`

Total elapsed time from first attempt to failed status: 30 + 60 + 120 + 240 + 480 = 930 seconds (~15.5 minutes).

Jitter (±10%) is applied to each delay to prevent multiple failing notifications from retrying simultaneously and amplifying provider load.

Per-Attempt Timeout

Each delivery attempt has a 10-second timeout (NOTIFY_ADAPTER_TIMEOUT_SEC). If the adapter's send() method does not return within 10 seconds, the attempt is counted as failed and the next retry is scheduled.

This prevents a slow external API from holding a RabbitMQ worker indefinitely and blocking other notifications in the queue.

Attempt 1:  send() called → timeout after 10s → retry 2 in ~30s
Attempt 2:  send() called → timeout after 10s → retry 3 in ~60s
…
Attempt 5:  send() called → success ✅ → status: delivered

Failed Status

After 5 failed attempts, the notification enters the failed state:

After attempt 5 fails:
  → PostgreSQL: notifications_failed table
  → status: "failed"
  → failedAt: <timestamp>
  → reason: "max retries exceeded" | "adapter_error: <message>"
  → visible in Admin → Notifications → Failed (retained 180 days)

The failed status is terminal — the Notify Service does not retry automatically beyond 5 attempts. Only a manual retry via the API restarts delivery.

Monitoring Failed Notifications

GET https://api.septemcore.com/v1/notifications/failed
Authorization: Bearer <access_token>

{
  "data": [
    {
      "notificationId": "01j9panot700000000000000",
      "channel":        "email",
      "userId":         "01j9pa5mz700000000000000",
      "status":         "failed",
      "reason":         "SendGrid API returned 503",
      "attempts":       5,
      "failedAt":       "2026-04-15T10:45:30.000Z"
    }
  ]
}

Pagination is keyset-based (same as all list endpoints). Failed notifications are retained for 180 days.

Manual Retry

Any failed notification can be re-queued manually:

POST https://api.septemcore.com/v1/notifications/01j9panot700000000000000/retry
Authorization: Bearer <access_token>

Response 202 Accepted:

{
  "notificationId": "01j9panot700000000000000",
  "status":         "queued",
  "attempt":        6
}

Manual retry is not subject to the 5-attempt limit — it always re-enqueues regardless of previous attempts. The attempt counter continues incrementing for audit purposes.

SDK equivalent:

await kernel.notify().retry('01j9panot700000000000000');
// { notificationId: '...', status: 'queued', attempt: 6 }

No Automatic Channel Fallback

The Notify Service does not automatically switch channels when one is unavailable:

Email channel is down.
Module called: send({ channel: 'email', userId: '...' })
Notify retries email 5 times → failed.

Notify does NOT automatically switch to SMS.
Tenant chose email as the delivery channel — that choice is respected.

Why no fallback: Automatic fallback would deliver notifications via channels the user did not configure or expect (SMS when email is preferred). This creates compliance issues (unsubscribe preferences, GDPR consent) and poor UX (unexpected SMS from an app you use for email).

If a module needs multi-channel resilience, it should send to multiple channels explicitly:

// Send to both email and SMS for critical OTP codes
await Promise.all([
  kernel.notify().send({ userId, channel: 'email', body, priority: 'critical' }),
  kernel.notify().send({ userId, channel: 'sms',   body, priority: 'critical' }),
]);

Each send tracks retries independently.

Environment Variables

Variable	Default	Description
`NOTIFY_MAX_RETRIES`	`5`	Max delivery attempts per notification
`NOTIFY_ADAPTER_TIMEOUT_SEC`	`10`	Timeout per delivery attempt
`NOTIFY_BACKOFF_BASE_SEC`	`30`	Base delay for exponential backoff
`NOTIFY_BACKOFF_JITTER_PERCENT`	`10`	Jitter ±% applied to each delay

Error Reference

Scenario	HTTP	Code
Notification already delivered	409	`NOTIFICATION_ALREADY_DELIVERED`
Notification not found	404	`not-found`
Notification in `queued` state (retry not needed)	409	`NOTIFICATION_NOT_FAILED`

Retry & Failed Notifications

On this page