senior loop
Deep Dives/Saga, Outbox & CDC for Payments

Saga, Outbox & CDC for Payments

A distributed transaction can coordinate multiple compatible resource managers, but independently deployed services rarely share that contract and pay a high availability and latency cost when they do. A payment therefore often becomes a saga: local transactions connected by durable messages, with explicit recovery or compensation when a later step fails.

PaymentsMicroservicesSenior IC~35 min · 10 sections

Prerequisites: Local ACID transactions, message brokers, and retry semantics.

After this: Choose between 2PC and a saga, build a transactional outbox, and design compensation or forward recovery.

Suggested first pass: Read sections 1–5, answer each section in your own words, then use the remaining failure modes and exercises as the advanced pass.

Technically reviewed 21 June 2026 · Primary reference: Debezium: Outbox Event Router

The saga pattern in three lines

1. A multi-service payment can be modeled as a saga: a chain of local ACID transactions connected by durable messages. Reversible steps define compensations; irreversible steps require forward recovery and careful ordering.

2. To update a database and publish an event without a distributed transaction, write the event to an outbox table in the same local transaction, then ship it out separately.

3. A relay (CDC / Debezium tailing the DB log, or a poller) reads the outbox and publishes to the broker. Delivery is at-least-once, so every consumer must be idempotent.

Illustrative saga shape — measure step count, lag, and retention in production
3–7
Local transactions in a typical checkout saga (auth → reserve → charge → fulfil → notify)
recovery
Each step needs an explicit retry, compensation, or forward-recovery policy
<1s
CDC end-to-end lag (commit → event on broker) when healthy
at-least-once
Outbox + CDC delivery guarantee → consumers MUST dedup
eventual
Consistency model; no global isolation, intermediate states visible
N rows/txn
Outbox growth; needs TTL/cleanup or it becomes the biggest table
How to think about sagas

Two-phase commit (2PC) can atomically coordinate participants that implement its prepare and recovery contract, but it holds resources while the decision is resolved and can block when the coordinator is unavailable. A saga is a different trade-off: steps commit independently and at different times, intermediate states are visible, and business invariants are restored through retries, compensation, or forward recovery rather than database rollback.

That reframes everything as two sub-problems. (a) Reliable event emission: "update the DB and tell the world" must be atomic. This is the dual-write problem, solved by the outbox. (b) Reliable orchestration of undo: if step 4 fails, run compensations for 3, 2, 1, in order and idempotently, even though a charge cannot truly be "rolled back." Those two sub-problems frame the entire design. (This is the pattern Idempotency & Exactly-Once Payment Processing pivots toward once one service becomes many.)

Next deep dive
Kafka Internals & Production Operations
~40 min