CACHING (THE REAL ART OF PERFORMANCE ENGINEERING)

Caching is the #1 tool used in industry to achieve massive performance gains.

It is also the #1 source of complexity and bugs if poorly designed.

We will make it easy.

Why Cache?

Because every system has one fundamental problem:

Reads happen WAY more often than writes.

Caching turns:

slow work → fast work
expensive work → free work
repeated work → zero work

Examples:

database queries
computed results
HTML pages
API responses
configuration data

TYPES OF CACHES

1. CDN Cache (Edge Cache)

Used for:

static assets
images
scripts
CSS
HTML
videos

Benefits:

Removes load from servers
Reduces latency (close to user)

CDNs: Cloudflare, Akamai, Fastly, Vercel Edge.

2. Application-Level Cache

Examples:

in-memory cache
memoization
React Query cache

Used for:

computed values
short-lived results

3. Distributed Cache (Redis / Memcached)

This is where real system design happens.

Redis is used for:

database query caching
session caching
rate limiting
leaderboards
ephemeral state
pub/sub

Memcached:

pure key-value lookup
extremely fast

Redis:

richer data types
transactions
persistence options

4. Database Cache

Examples:

MySQL query cache (deprecated in newer versions)
Postgres buffer cache

These are internal DB optimizations.

CACHE INVALIDATION (THE HARD PART)

The two hardest problems in computer science:

Naming things
Cache invalidation
Off-by-one errors

😄

Cache invalidation decides how the cache updates when data changes.

There are 4 major strategies:

Strategy 1 — TTL (Time-Based Expiry)

Simplest.

Example:

cache for 60 seconds

Pros:

easy
predictable
safe

Cons:

stale data appears occasionally

Used for:

dashboards
read-heavy workloads

Strategy 2 — Write-Through Cache

When DB updates → cache updates immediately.

Pros:

reliability
consistency

Cons:

slower writes
more system complexity

Strategy 3 — Write-Behind Cache

Write to cache → asynchronously write to DB later.

Pros:

extremely fast writes

Cons:

risky
possible data loss
requires careful design

Used for:

analytics
counters
logs

Strategy 4 — Cache Aside (Most Popular)

Application logic:

first checks cache
if missing, fetch from DB
populate cache

Used by:

Netflix
Uber
every major distributed system

This is the safest, simplest, and most scalable.

DESIGNING CACHES THAT NEVER BREAK

Rule 1: Cache only stable, rarely-changing data

Do NOT cache:

volatile counters
frequently changing entities

Cache:

user profiles
configuration
product catalogs

Rule 2: Use versioned cache keys

Example:

user:123:v2

When schema changes → bump version.

Rule 3: Treat caches as hints, not truth

Never assume the cached data is correct.

Rule 4: Use compression for large data

Redis supports gzip / zstd compression.

Rule 5: Evict keys intentionally

Use:

LRU
LFU
FIFO

CDN DESIGN

CDNs are basically super-fast global caches.

They:

reduce latency
reduce server load
handle DDoS
serve assets close to user

CDNs cache:

static files
prerendered HTML
API GET responses
image variants

Vercel Edge, Cloudflare, and Fastly operate at L3/L4/L7 layers.

QUEUES (THE HEART OF SCALABILITY)

Queues solve:

write spikes
load leveling
retries
async processing
decoupling

Queues are essential for:

notifications
email sending
video processing
heavy compute
billing workflows

Queue Technologies

SQS
RabbitMQ
Kafka
Redis Streams

WHEN TO USE A QUEUE

Use queues when:

work doesn’t need to happen immediately
work is expensive
workloads spike unpredictably
you want reliability
you want retries

Examples:

SMS sending
sending invoices
processing uploads
caching refresh jobs

QUEUE DESIGN PATTERNS

Pattern 1: Worker Pool

Used in:

background jobs
pipelines

Pattern 2: Fan-Out (Pub/Sub)

Used for:

notifications
analytics
syncing services

Pattern 3: Delayed Jobs

Execute a task later (like reminders).

Pattern 4: DLQ (Dead Letter Queue)

Failed messages go here for manual remediation.

STREAMS (THE REAL-TIME DATA PIPELINE)

Streams are different from queues.

Queues:

point-to-point
message consumed once

Streams:

logs
append-only
multiple consumers
replayable

Examples:

Kafka
Kinesis
Pulsar

Used for:

analytics
real-time dashboards
fraud detection
audit logging
chat
click-stream data

RATE LIMITING

Rate limiting protects:

APIs
databases
internal services
external dependencies

Without it → a single user can take down the system.

Rate Limiting Algorithms

1. Fixed Window

Simple but bursty.

2. Sliding Window

Better distribution.

3. Token Bucket

Most widely used.

Allows bursts → refills at fixed rate.

4. Leaky Bucket

Smooth, uniform output rate.

BACKPRESSURE (THE MOST UNDERRATED CONCEPT)

Backpressure happens when:

producers → generate faster than
consumers → can process

This leads to:

queue growth
memory exhaustion
cascading failures

High-quality systems:

slow down producers
drop messages
load-shed
auto-scale workers
pause ingestion

This is how Stripe, Uber, and Netflix prevent outages.

HIGH THROUGHPUT SYSTEM DESIGN

To design a high-throughput system:

Rule 1 — Make everything asynchronous

Rule 2 — Scale horizontally

Add:

workers
partitions
shards

Not CPU.

Rule 3 — Use caches everywhere

Reduce load on:

DB
services
external APIs

Rule 4 — Keep hot data in memory

Redis/Memcached → almost 100x faster than DB.

Rule 5 — Avoid joins in hot paths

Pre-compute → denormalize.

Rule 6 — Break workflows into stages

Pipelines = scalable.

Rule 7 — Design for failure

Retries

Backoff

DLQ

Circuit breakers

Why Cache?

TYPES OF CACHES

1. CDN Cache (Edge Cache)​

2. Application-Level Cache​

3. Distributed Cache (Redis / Memcached)​

4. Database Cache​

CACHE INVALIDATION (THE HARD PART)

Strategy 1 — TTL (Time-Based Expiry)​

Strategy 2 — Write-Through Cache​

Strategy 3 — Write-Behind Cache​

Strategy 4 — Cache Aside (Most Popular)​

DESIGNING CACHES THAT NEVER BREAK

Rule 1: Cache only stable, rarely-changing data​

Rule 2: Use versioned cache keys​

Rule 3: Treat caches as hints, not truth​

Rule 4: Use compression for large data​

Rule 5: Evict keys intentionally​

CDN DESIGN

QUEUES (THE HEART OF SCALABILITY)

Queue Technologies

WHEN TO USE A QUEUE

QUEUE DESIGN PATTERNS

Pattern 1: Worker Pool​

Pattern 2: Fan-Out (Pub/Sub)​

Pattern 3: Delayed Jobs​

Pattern 4: DLQ (Dead Letter Queue)​

STREAMS (THE REAL-TIME DATA PIPELINE)

RATE LIMITING

Rate Limiting Algorithms​

1. Fixed Window​

2. Sliding Window​

3. Token Bucket​

4. Leaky Bucket​

BACKPRESSURE (THE MOST UNDERRATED CONCEPT)

HIGH THROUGHPUT SYSTEM DESIGN

Rule 1 — Make everything asynchronous​

Rule 2 — Scale horizontally​

Rule 3 — Use caches everywhere​

Rule 4 — Keep hot data in memory​

Rule 5 — Avoid joins in hot paths​

Rule 6 — Break workflows into stages​

Rule 7 — Design for failure​

1. CDN Cache (Edge Cache)

2. Application-Level Cache

3. Distributed Cache (Redis / Memcached)

4. Database Cache

Strategy 1 — TTL (Time-Based Expiry)

Strategy 2 — Write-Through Cache

Strategy 3 — Write-Behind Cache

Strategy 4 — Cache Aside (Most Popular)

Rule 1: Cache only stable, rarely-changing data

Rule 2: Use versioned cache keys

Rule 3: Treat caches as hints, not truth

Rule 4: Use compression for large data

Rule 5: Evict keys intentionally

Pattern 1: Worker Pool

Pattern 2: Fan-Out (Pub/Sub)

Pattern 3: Delayed Jobs

Pattern 4: DLQ (Dead Letter Queue)

Rate Limiting Algorithms

1. Fixed Window

2. Sliding Window

3. Token Bucket

4. Leaky Bucket

Rule 1 — Make everything asynchronous

Rule 2 — Scale horizontally

Rule 3 — Use caches everywhere

Rule 4 — Keep hot data in memory

Rule 5 — Avoid joins in hot paths

Rule 6 — Break workflows into stages

Rule 7 — Design for failure