SECTION 1 — CACHING (THE REAL ART OF PERFORMANCE ENGINEERING)
Caching is the #1 tool used in industry to achieve massive performance gains.
It is also the #1 source of complexity and bugs if poorly designed.
We will make it easy.
Why Cache?
Because every system has one fundamental problem:
Reads happen WAY more often than writes.
Caching turns:
-
slow work → fast work
-
expensive work → free work
-
repeated work → zero work
Examples:
-
database queries
-
computed results
-
HTML pages
-
API responses
-
configuration data
TYPES OF CACHES
1. CDN Cache (Edge Cache)
Used for:
-
static assets
-
images
-
scripts
-
CSS
-
HTML
-
videos
Benefits:
-
Removes load from servers
-
Reduces latency (close to user)
CDNs: Cloudflare, Akamai, Fastly, Vercel Edge.
2. Application-Level Cache
Examples:
-
in-memory cache
-
memoization
-
React Query cache
Used for:
-
computed values
-
short-lived results
3. Distributed Cache (Redis / Memcached)
This is where real system design happens.
Redis is used for:
-
database query caching
-
session caching
-
rate limiting
-
leaderboards
-
ephemeral state
-
pub/sub
Memcached:
-
pure key-value lookup
-
extremely fast
Redis:
-
richer data types
-
transactions
-
persistence options
4. Database Cache
Examples:
-
MySQL query cache (deprecated in newer versions)
-
Postgres buffer cache
These are internal DB optimizations.
SECTION 2 — CACHE INVALIDATION (THE HARD PART)
The two hardest problems in computer science:
-
Naming things
-
Cache invalidation
-
Off-by-one errors
😄
Cache invalidation decides how the cache updates when data changes.
There are 4 major strategies:
Strategy 1 — TTL (Time-Based Expiry)
Simplest.
Example:
- cache for 60 seconds
Pros:
-
easy
-
predictable
-
safe
Cons:
- stale data appears occasionally
Used for:
-
dashboards
-
read-heavy workloads
Strategy 2 — Write-Through Cache
When DB updates → cache updates immediately.
Pros:
-
reliability
-
consistency
Cons:
-
slower writes
-
more system complexity
Strategy 3 — Write-Behind Cache
Write to cache → asynchronously write to DB later.
Pros:
- extremely fast writes
Cons:
-
risky
-
possible data loss
-
requires careful design
Used for:
-
analytics
-
counters
-
logs
Strategy 4 — Cache Aside (Most Popular)
Application logic:
-
first checks cache
-
if missing, fetch from DB
-
populate cache
Used by:
-
Netflix
-
Uber
-
every major distributed system
This is the safest, simplest, and most scalable.
SECTION 3 — DESIGNING CACHES THAT NEVER BREAK
Rule 1: Cache only stable, rarely-changing data
Do NOT cache:
-
volatile counters
-
frequently changing entities
Cache:
-
user profiles
-
configuration
-
product catalogs
Rule 2: Use versioned cache keys
Example:
user:123:v2
When schema changes → bump version.
Rule 3: Treat caches as hints, not truth
Never assume the cached data is correct.
Rule 4: Use compression for large data
Redis supports gzip / zstd compression.
Rule 5: Evict keys intentionally
Use:
-
LRU
-
LFU
-
FIFO
SECTION 4 — CDN DESIGN
CDNs are basically super-fast global caches.
They:
-
reduce latency
-
reduce server load
-
handle DDoS
-
serve assets close to user
CDNs cache:
-
static files
-
prerendered HTML
-
API GET responses
-
image variants
Vercel Edge, Cloudflare, and Fastly operate at L3/L4/L7 layers.
SECTION 5 — QUEUES (THE HEART OF SCALABILITY)
Queues solve:
-
write spikes
-
load leveling
-
retries
-
async processing
-
decoupling
Queues are essential for:
-
notifications
-
email sending
-
video processing
-
heavy compute
-
billing workflows
Queue Technologies
-
SQS
-
RabbitMQ
-
Kafka
-
Redis Streams
SECTION 6 — WHEN TO USE A QUEUE
Use queues when:
-
work doesn’t need to happen immediately
-
work is expensive
-
workloads spike unpredictably
-
you want reliability
-
you want retries
Examples:
-
SMS sending
-
sending invoices
-
processing uploads
-
caching refresh jobs
SECTION 7 — QUEUE DESIGN PATTERNS
Pattern 1: Worker Pool
Workers pull messages from queue → process → ack.
Used in:
-
background jobs
-
pipelines
Pattern 2: Fan-Out (Pub/Sub)
Publish event → multiple subscribers handle it.
Used for:
-
notifications
-
analytics
-
syncing services
Pattern 3: Delayed Jobs
Execute a task later (like reminders).
Pattern 4: DLQ (Dead Letter Queue)
Failed messages go here for manual remediation.
SECTION 8 — STREAMS (THE REAL-TIME DATA PIPELINE)
Streams are different from queues.
Queues:
-
point-to-point
-
message consumed once
Streams:
-
logs
-
append-only
-
multiple consumers
-
replayable
Examples:
-
Kafka
-
Kinesis
-
Pulsar
Used for:
-
analytics
-
real-time dashboards
-
fraud detection
-
audit logging
-
chat
-
click-stream data
SECTION 9 — RATE LIMITING
Rate limiting protects:
-
APIs
-
databases
-
internal services
-
external dependencies
Without it → a single user can take down the system.
Rate Limiting Algorithms
1. Fixed Window
Simple but bursty.
2. Sliding Window
Better distribution.
3. Token Bucket
Most widely used.
Allows bursts → refills at fixed rate.
4. Leaky Bucket
Smooth, uniform output rate.
SECTION 10 — BACKPRESSURE (THE MOST UNDERRATED CONCEPT)
Backpressure happens when:
-
producers → generate faster than
-
consumers → can process
This leads to:
-
queue growth
-
memory exhaustion
-
cascading failures
High-quality systems:
-
slow down producers
-
drop messages
-
load-shed
-
auto-scale workers
-
pause ingestion
This is how Stripe, Uber, and Netflix prevent outages.
SECTION 11 — HIGH THROUGHPUT SYSTEM DESIGN
To design a high-throughput system:
Rule 1 — Make everything asynchronous
HTTP requests → queue → workers → DB → events.
Rule 2 — Scale horizontally
Add:
-
workers
-
partitions
-
shards
Not CPU.
Rule 3 — Use caches everywhere
Reduce load on:
-
DB
-
services
-
external APIs
Rule 4 — Keep hot data in memory
Redis/Memcached → almost 100x faster than DB.
Rule 5 — Avoid joins in hot paths
Pre-compute → denormalize.
Rule 6 — Break workflows into stages
Pipelines = scalable.
Rule 7 — Design for failure
Retries
Backoff
DLQ
Circuit breakers