Back to blog

Webhook Event Delivery for Real-Time Banking Apps: Design Considerations

How to design webhook delivery for a financial data API that banking apps depend on.

Webhook Event Delivery for Real-Time Banking Apps: Design Considerations

Financial data webhooks are not like other webhooks

Most webhook design discussions start from the assumption that the primary failure mode to protect against is delivery failure — what happens when the receiving endpoint is down, or when the network drops the event. These are real problems, and at-least-once delivery with exponential backoff retry is the standard answer. For general SaaS webhooks, this design is usually sufficient.

Financial data webhooks carry additional constraints that change the design requirements in important ways. When you're delivering transaction classification results, cash-flow forecast updates, or balance event notifications to a banking app, you have to contend with: duplicate delivery producing duplicate financial records; out-of-order delivery producing incorrect account state; partial delivery causing reconciliation gaps; and the downstream consequence that an error in webhook delivery can produce a wrong number displayed to an SMB business owner making a financial decision. The stakes are higher, and the failure modes are qualitatively different from a SaaS notification webhook.

This article covers the design choices that matter specifically for financial data webhook delivery — not a generic webhook primer.

At-least-once vs. exactly-once: the real trade-off

At-least-once delivery guarantees that every event will be delivered, possibly more than once. Exactly-once delivery guarantees that every event is delivered exactly once, never duplicated. In theory, exactly-once is what you want for financial data. In practice, exactly-once delivery at the infrastructure layer is extremely difficult to guarantee end-to-end across a network boundary, because it requires distributed coordination between sender and receiver that is prone to its own failure modes.

The standard production approach for financial data webhooks is: at-least-once delivery from the sender, with idempotency key enforcement at the receiver. The sender includes a unique event_id with every delivery. The receiver checks whether that event_id has been processed before. If it has, the receiver discards the duplicate without reprocessing. This achieves effective exactly-once semantics without requiring infrastructure-level exactly-once guarantees.

// Webhook payload structure
{
  "event_id": "evt_a4f2b8c1-9d3e-4f7a-b2c5-1e8f3a9d2b7c",
  "event_type": "classification.batch_completed",
  "timestamp": "2025-07-14T14:22:31.441Z",
  "account_id": "acct_8a3f",
  "data": {
    "batch_id": "batch_c7e9f2",
    "transaction_count": 47,
    "classifications_ready": true
  }
}

// Receiver idempotency check (pseudocode)
if await redis.exists(f"event:{event_id}"):
    return 200  # Already processed, discard silently
await process_event(event)
await redis.set(f"event:{event_id}", "processed", ex=86400)

The idempotency window — how long you store processed event IDs — should be at least as long as your retry window plus a buffer. If your sender retries for up to 24 hours, store event IDs for at least 48 hours. A processed event ID that expires from your cache before the retry window closes creates a window where a duplicate can slip through.

Ordering guarantees and the sequence problem

For transaction classification webhooks, strict ordering is less critical than for payment event webhooks. A classification result for transaction A arriving before the result for transaction B doesn't usually affect correctness — each classification is independent. However, for forecast update events and account state events, ordering matters more: a forecast update based on state at T1 arriving after a forecast update based on state at T2 could produce a stale forecast overwriting a fresh one.

The practical solution: include a sequence_number or timestamp with monotonically increasing values in every event for a given account. Receivers that need ordering guarantees check whether the incoming event's sequence is newer than the last processed sequence for that account, and discard or queue out-of-order arrivals accordingly.

We're not saying you need to implement full event sourcing with total ordering for transaction enrichment webhooks — that's significant engineering overhead for the actual reliability gain. We are saying that for any event that updates a materialized state (like a forecast signal or an account health score), the receiver should at minimum check whether the incoming event is newer than the current state before applying it.

Retry strategy: exponential backoff with a dead-letter path

The standard retry pattern for webhook delivery: exponential backoff starting at 30 seconds, doubling on each failure, with jitter to prevent thundering herd effects on recovery. Maximum retry window of 24 hours for financial events — beyond that, the event age may make the data stale enough that automated recovery is no longer appropriate.

What to do with events that exhaust the retry window: dead-letter queue. Every financial data webhook sender should have a dead-letter mechanism that preserves undelivered events for manual inspection or manual replay. The dead-letter queue is not an optional feature — it's the safety net for any network partition, infrastructure incident, or misconfigured endpoint that causes a sustained delivery failure. Without it, events that fail delivery during a 6-hour outage are silently lost.

For the receiving side: expose a manual event replay endpoint or a "reconciliation pull" endpoint that allows the receiver to request re-delivery of events for a given time window. This makes recovery from incidents operational rather than requiring support tickets to the sender.

Signature verification: the security requirement that's often skipped

Webhook endpoints that accept financial data without verifying the request signature are vulnerable to event injection — an attacker can POST fabricated events to the endpoint and trigger incorrect state changes in the receiving app. For a transaction classification webhook, a fabricated event could inject incorrect category corrections into a banking app's database.

HMAC-SHA256 request signing is the standard approach. The sender includes a header like X-Spendaq-Signature: sha256=<hex_signature> computed as an HMAC of the request body using a shared secret. The receiver recomputes the HMAC and compares. If they don't match, the request is rejected with 401.

// Signature verification (Node.js)
const expectedSig = crypto
  .createHmac('sha256', process.env.WEBHOOK_SECRET)
  .update(req.rawBody)
  .digest('hex');

const receivedSig = req.headers['x-spendaq-signature']
  .replace('sha256=', '');

if (!crypto.timingSafeEqual(
  Buffer.from(expectedSig, 'hex'),
  Buffer.from(receivedSig, 'hex')
)) {
  return res.status(401).json({ error: 'Invalid signature' });
}

Use timingSafeEqual rather than string comparison — timing-safe comparison prevents timing attacks that could be used to brute-force the signature. This is a 2-line change that eliminates the injection vector entirely. It's surprising how often production webhook endpoints skip this step.

Monitoring webhook delivery health

Webhook delivery failure is often silent from the sender's perspective if the receiver returns a 2xx status. The sender considers the event delivered. The receiver may have acknowledged the webhook but failed internally after acknowledgment, orphaning the event. This is why the idempotency and retry patterns above need to be complemented with active monitoring.

Key metrics to track on the receiver side: incoming webhook rate per event type, processing success rate, dead-letter queue depth, average lag between event timestamp and processing timestamp. Any sudden drop in incoming rate is a signal that either the sender stopped delivering or your endpoint became unreachable. Any spike in dead-letter depth signals a processing failure that needs investigation.

On the sender side, track delivery acknowledgment rate and retry rate per customer endpoint. A customer endpoint with a persistently elevated retry rate is a signal that their infrastructure has a problem that will eventually produce data gaps if unaddressed — worth a proactive outreach rather than waiting for the events to exhaust the retry window and hit the dead-letter queue.