Skip to main content

SECTION 1 — CACHING (THE REAL ART OF PERFORMANCE ENGINEERING)

Caching is the #1 tool used in industry to achieve massive performance gains.

It is also the #1 source of complexity and bugs if poorly designed.

We will make it easy.


Why Cache?

Because every system has one fundamental problem:

Reads happen WAY more often than writes.

Caching turns:

  • slow work → fast work

  • expensive work → free work

  • repeated work → zero work

Examples:

  • database queries

  • computed results

  • HTML pages

  • API responses

  • configuration data


TYPES OF CACHES


1. CDN Cache (Edge Cache)

Used for:

  • static assets

  • images

  • scripts

  • CSS

  • HTML

  • videos

Benefits:

  • Removes load from servers

  • Reduces latency (close to user)

CDNs: Cloudflare, Akamai, Fastly, Vercel Edge.


2. Application-Level Cache

Examples:

  • in-memory cache

  • memoization

  • React Query cache

Used for:

  • computed values

  • short-lived results


3. Distributed Cache (Redis / Memcached)

This is where real system design happens.

Redis is used for:

  • database query caching

  • session caching

  • rate limiting

  • leaderboards

  • ephemeral state

  • pub/sub

Memcached:

  • pure key-value lookup

  • extremely fast

Redis:

  • richer data types

  • transactions

  • persistence options


4. Database Cache

Examples:

  • MySQL query cache (deprecated in newer versions)

  • Postgres buffer cache

These are internal DB optimizations.


SECTION 2 — CACHE INVALIDATION (THE HARD PART)

The two hardest problems in computer science:

  1. Naming things

  2. Cache invalidation

  3. Off-by-one errors

😄

Cache invalidation decides how the cache updates when data changes.

There are 4 major strategies:


Strategy 1 — TTL (Time-Based Expiry)

Simplest.

Example:

  • cache for 60 seconds

Pros:

  • easy

  • predictable

  • safe

Cons:

  • stale data appears occasionally

Used for:

  • dashboards

  • read-heavy workloads


Strategy 2 — Write-Through Cache

When DB updates → cache updates immediately.

Pros:

  • reliability

  • consistency

Cons:

  • slower writes

  • more system complexity


Strategy 3 — Write-Behind Cache

Write to cache → asynchronously write to DB later.

Pros:

  • extremely fast writes

Cons:

  • risky

  • possible data loss

  • requires careful design

Used for:

  • analytics

  • counters

  • logs


Application logic:

  • first checks cache

  • if missing, fetch from DB

  • populate cache

Used by:

  • Netflix

  • Uber

  • every major distributed system

This is the safest, simplest, and most scalable.


SECTION 3 — DESIGNING CACHES THAT NEVER BREAK

Rule 1: Cache only stable, rarely-changing data

Do NOT cache:

  • volatile counters

  • frequently changing entities

Cache:

  • user profiles

  • configuration

  • product catalogs


Rule 2: Use versioned cache keys

Example:

user:123:v2

When schema changes → bump version.


Rule 3: Treat caches as hints, not truth

Never assume the cached data is correct.


Rule 4: Use compression for large data

Redis supports gzip / zstd compression.


Rule 5: Evict keys intentionally

Use:

  • LRU

  • LFU

  • FIFO


SECTION 4 — CDN DESIGN

CDNs are basically super-fast global caches.

They:

  • reduce latency

  • reduce server load

  • handle DDoS

  • serve assets close to user

CDNs cache:

  • static files

  • prerendered HTML

  • API GET responses

  • image variants

Vercel Edge, Cloudflare, and Fastly operate at L3/L4/L7 layers.


SECTION 5 — QUEUES (THE HEART OF SCALABILITY)

Queues solve:

  • write spikes

  • load leveling

  • retries

  • async processing

  • decoupling

Queues are essential for:

  • notifications

  • email sending

  • video processing

  • heavy compute

  • billing workflows


Queue Technologies

  • SQS

  • RabbitMQ

  • Kafka

  • Redis Streams


SECTION 6 — WHEN TO USE A QUEUE

Use queues when:

  • work doesn’t need to happen immediately

  • work is expensive

  • workloads spike unpredictably

  • you want reliability

  • you want retries

Examples:

  • SMS sending

  • sending invoices

  • processing uploads

  • caching refresh jobs


SECTION 7 — QUEUE DESIGN PATTERNS

Pattern 1: Worker Pool

Workers pull messages from queue → process → ack.

Used in:

  • background jobs

  • pipelines


Pattern 2: Fan-Out (Pub/Sub)

Publish event → multiple subscribers handle it.

Used for:

  • notifications

  • analytics

  • syncing services


Pattern 3: Delayed Jobs

Execute a task later (like reminders).


Pattern 4: DLQ (Dead Letter Queue)

Failed messages go here for manual remediation.


SECTION 8 — STREAMS (THE REAL-TIME DATA PIPELINE)

Streams are different from queues.

Queues:

  • point-to-point

  • message consumed once

Streams:

  • logs

  • append-only

  • multiple consumers

  • replayable

Examples:

  • Kafka

  • Kinesis

  • Pulsar

Used for:

  • analytics

  • real-time dashboards

  • fraud detection

  • audit logging

  • chat

  • click-stream data


SECTION 9 — RATE LIMITING

Rate limiting protects:

  • APIs

  • databases

  • internal services

  • external dependencies

Without it → a single user can take down the system.


Rate Limiting Algorithms

1. Fixed Window

Simple but bursty.

2. Sliding Window

Better distribution.

3. Token Bucket

Most widely used.

Allows bursts → refills at fixed rate.

4. Leaky Bucket

Smooth, uniform output rate.


SECTION 10 — BACKPRESSURE (THE MOST UNDERRATED CONCEPT)

Backpressure happens when:

  • producers → generate faster than

  • consumers → can process

This leads to:

  • queue growth

  • memory exhaustion

  • cascading failures

High-quality systems:

  • slow down producers

  • drop messages

  • load-shed

  • auto-scale workers

  • pause ingestion

This is how Stripe, Uber, and Netflix prevent outages.


SECTION 11 — HIGH THROUGHPUT SYSTEM DESIGN

To design a high-throughput system:


Rule 1 — Make everything asynchronous

HTTP requests → queue → workers → DB → events.


Rule 2 — Scale horizontally

Add:

  • workers

  • partitions

  • shards

Not CPU.


Rule 3 — Use caches everywhere

Reduce load on:

  • DB

  • services

  • external APIs


Rule 4 — Keep hot data in memory

Redis/Memcached → almost 100x faster than DB.


Rule 5 — Avoid joins in hot paths

Pre-compute → denormalize.


Rule 6 — Break workflows into stages

Pipelines = scalable.


Rule 7 — Design for failure

Retries

Backoff

DLQ

Circuit breakers