Async Processing, Queues & Sagas

WHY ASYNCHRONY IS UNAVOIDABLE

Real systems cannot be fully synchronous because:

users are slow
networks are unreliable
external services fail
workloads spike unpredictably
some work is expensive

Therefore:

Any serious backend system is partially asynchronous.

Trying to keep everything synchronous:

increases latency
reduces availability
couples systems tightly
causes cascading failures

SYNCHRONOUS VS ASYNCHRONOUS BOUNDARIES

Elite engineers are intentional about where async boundaries exist.

Synchronous (User-Facing)

validation
authorization
lightweight reads
state checks

Asynchronous (Background)

emails / notifications
payments confirmation
analytics
document processing
integrations
retries

Rule:

If the user doesn’t need the result immediately, it should be async.

QUEUES AS LOAD-LEVELERS

Queues exist to:

absorb spikes
decouple producers from consumers
smooth load
enable retries
provide durability

They turn:

“Do this now”

into

“Do this reliably.”

Queue Mental Model

Key properties:

producers don’t care who processes
workers don’t care who produced
failure is isolated

This separation is critical for scale.

DELIVERY SEMANTICS (THE HARD PART)

Every queue system has delivery guarantees:

At-Most-Once

fast
can lose messages
no retries

At-Least-Once

messages may repeat
must handle duplicates
most common in production

Exactly-Once

extremely difficult
usually simulated via idempotency

Elite Rule

Assume at-least-once delivery and design idempotently.

Never assume “exactly once” unless you deeply understand the system.

IDENTITY IN ASYNC SYSTEMS

Every message must have:

unique ID
correlation ID
causation ID (optional but powerful)

Why?

tracing
deduplication
debugging
audits

If you can’t trace a message across services, you don’t control the system.

EVENT-DRIVEN ARCHITECTURE

Event-driven systems communicate by facts, not commands.

Example:

❌ “Charge this payment”
✅ “PaymentRequested”
✅ “PaymentCompleted”
✅ “PaymentFailed”

Events represent things that happened, not things you want to happen.

This distinction matters enormously.

Benefits

loose coupling
scalability
extensibility
auditability

Costs

complexity
eventual consistency
harder debugging

Elite engineers accept these tradeoffs intentionally.

WORKFLOWS VS EVENTS

Two styles exist:

Choreography (Event-Driven)

services react to events
no central coordinator
highly decoupled
harder to reason about

Orchestration (Workflow-Driven)

central orchestrator
explicit state machine
easier correctness
more coupling

Elite Rule

Use orchestration for business-critical workflows.

Use events for side effects and extensions.

Payments, bookings, onboarding → orchestration

Analytics, notifications → events

SAGAS (DISTRIBUTED TRANSACTIONS)

In distributed systems:

You cannot have global transactions.

Sagas replace them.

Saga Definition

A saga is:

a sequence of steps
each step has a compensating action
system is eventually consistent

Example: Booking Saga

Reserve seat
Charge payment
Confirm booking

If step 2 fails:

compensate step 1 → release seat

Key Properties

each step is idempotent
compensations are explicit
failures are expected
retries are normal

FAILURE IS THE DEFAULT STATE

Elite backend engineers assume:

workers crash
messages duplicate
services time out
databases slow down
networks partition

Therefore systems must:

retry safely
back off exponentially
isolate failures
recover automatically

RETRIES DONE RIGHT

Retries are dangerous if misused.

Rules:

retries must be bounded
retries must be idempotent
retries must have backoff
retries must use jitter

Never retry blindly.

Retry Storms

If many services retry simultaneously → system collapse.

Elite engineers:

cap retries
use circuit breakers
shed load intentionally

DEAD LETTER QUEUES (DLQ)

Any system without a DLQ is incomplete.

DLQs are for:

poisoned messages
repeated failures
manual inspection
remediation

Elite engineers:

monitor DLQ volume
alert on growth
replay messages safely

TIME AS A FIRST-CLASS CONCERN

Workflows span time:

minutes
hours
days

Systems must handle:

delayed jobs
scheduled retries
timeouts
expiration

Time introduces complexity — ignoring it causes corruption.

OBSERVABILITY IN ASYNC SYSTEMS

You must observe:

queue depth
processing latency
retry counts
DLQ size
success vs failure rate

Without this:

Your system is operating blind.

COMMON ASYNC TRAPS

❌ Fire-and-forget messages

❌ No idempotency

❌ No DLQ

❌ No retries

❌ Long-running workers with no checkpoints

❌ Hidden coupling through shared DBs

These traps cause catastrophic outages.

SIGNALS YOU’VE MASTERED ASYNC BACKENDS

You know you’re there when:

you naturally split sync vs async paths
you design workflows explicitly
you assume retries & duplicates
you can reason about partial failure
you can explain eventual consistency calmly

WHY ASYNCHRONY IS UNAVOIDABLE​

SYNCHRONOUS VS ASYNCHRONOUS BOUNDARIES​

Synchronous (User-Facing)​

Asynchronous (Background)​

QUEUES AS LOAD-LEVELERS​

Queue Mental Model​

DELIVERY SEMANTICS (THE HARD PART)​

At-Most-Once​

At-Least-Once​

Exactly-Once​

Elite Rule​

IDENTITY IN ASYNC SYSTEMS​

EVENT-DRIVEN ARCHITECTURE​

Benefits​

Costs​

WORKFLOWS VS EVENTS​

Choreography (Event-Driven)​

Orchestration (Workflow-Driven)​

Elite Rule​

SAGAS (DISTRIBUTED TRANSACTIONS)​

Saga Definition​

Example: Booking Saga​

Key Properties​

FAILURE IS THE DEFAULT STATE​

RETRIES DONE RIGHT​

Rules:​

Retry Storms​

DEAD LETTER QUEUES (DLQ)​

TIME AS A FIRST-CLASS CONCERN​

OBSERVABILITY IN ASYNC SYSTEMS​

COMMON ASYNC TRAPS​

SIGNALS YOU’VE MASTERED ASYNC BACKENDS​

WHY ASYNCHRONY IS UNAVOIDABLE

SYNCHRONOUS VS ASYNCHRONOUS BOUNDARIES

Synchronous (User-Facing)

Asynchronous (Background)

QUEUES AS LOAD-LEVELERS

Queue Mental Model

DELIVERY SEMANTICS (THE HARD PART)

At-Most-Once

At-Least-Once

Exactly-Once

Elite Rule

IDENTITY IN ASYNC SYSTEMS

EVENT-DRIVEN ARCHITECTURE

Benefits

Costs

WORKFLOWS VS EVENTS

Choreography (Event-Driven)

Orchestration (Workflow-Driven)

Elite Rule

SAGAS (DISTRIBUTED TRANSACTIONS)

Saga Definition

Example: Booking Saga

Key Properties

FAILURE IS THE DEFAULT STATE

RETRIES DONE RIGHT

Rules:

Retry Storms

DEAD LETTER QUEUES (DLQ)

TIME AS A FIRST-CLASS CONCERN

OBSERVABILITY IN ASYNC SYSTEMS

COMMON ASYNC TRAPS

SIGNALS YOU’VE MASTERED ASYNC BACKENDS