SECTION 1 — THE PRIME DIRECTIVE OF DATA DESIGN
“Data outlives code.”
This is one of the first rules you learn at companies like Google and Stripe.
-
Code can be rewritten.
-
Services can be replaced.
-
Frameworks go out of style.
-
APIs change.
But data structures are permanent.
A bad schema haunts a company for years.
This is why Staff Engineers focus on data modeling FIRST, not API design or microservices.
SECTION 2 — SQL vs NoSQL: THINK IN TERMS OF DATA SHAPE
Most engineers choose databases based on trends or "what the company uses."
Top engineers choose based on data shape.
Use SQL when data is:
-
relational
-
structured
-
transactional
-
highly consistent
-
joins are natural
-
strong constraints exist (foreign keys, unique keys, checks)
Examples:
-
fintech
-
orders/payments
-
healthcare
-
authentication
-
booking flows
Use NoSQL when data is:
-
semi-structured
-
nested
-
document-oriented
-
variable in shape
-
primarily key-value access
-
high-volume reads
-
low-latency
Examples:
-
session storage
-
activity logs
-
user preferences
-
chat messages
-
product catalog
-
caching layer
RULE OF STAFF ENGINEERS:
If relationships drive your logic → SQL is almost always better.
If documents drive your logic → NoSQL is often better.
SECTION 3 — DATA MODELING MENTAL MODELS
Here are the models used by architects at Stripe, Shopify, Airbnb.
Model 1 — Entities & Relationships
Every system has:
-
Entities (User, Order, Message, Pet, CallSession)
-
Relationships (1–1, 1–many, many–many)
Understanding these defines your tables/collections.
Model 2 — Access Patterns First
This is critical:
Design for how the data will be READ and WRITTEN.
Not how you think the data “should look.”
Examples:
-
If you often need
user + profile + settings→ denormalize into one table or add a view. -
If you frequently need
order + items→ model it together and index foreign keys.
Access patterns determine:
-
indexing
-
denormalization
-
table layout
-
caching strategy
Model 3 — State Machines → Data Models
Modeling flows as state machines gives you:
state: ENUM
updated_at
transition_reason
previous_state
This makes debugging and analytics easy.
Examples:
-
Payment: pending → authorized → captured → refunded
-
Enrollment: draft → submitted → review → approved → completed
-
Video Call: idle → ringing → connected → ended
State-first modeling removes ambiguity.
Model 4 — Invariants
The most important part of data modeling.
Examples:
-
A user cannot have two active carts.
-
A payment cannot be modified after capture.
-
A pet must belong to exactly one owner.
-
A call session must always have a unique channel ID.
Invariants → constraints → schema design.
Model 5 — Data Ownership
Identify which service owns which data.
Example:
-
Auth owns User Credentials
-
Profile service owns display info
-
Billing owns balance + payments
This prevents:
-
duplication
-
inconsistency
-
cross-service entanglement
SECTION 4 — INDEXING MASTERY
This is where Senior → Staff transitions happen.
Indexing can turn a 5s query → 5ms.
Understanding indexing is Staff Engineer territory.
Index Type 1 — Single Column Index
Example:
INDEX (email)
Use for:
-
unique lookup
-
high cardinality fields
-
login queries
Index Type 2 — Composite Index
Order matters.
INDEX (user_id, created_at)
Query must match prefix ordering.
Perfect for:
-
filtering + sorting
-
pagination
-
dashboard queries
Index Type 3 — Covering Index
Index contains all needed fields → query never touches the table.
Huge performance boost.
Index Type 4 — Partial Index
Example:
WHERE status = 'active'
Used for:
-
hot-path queries
-
sparse data
-
filtering heavy workloads
Index Type 5 — Unique Index
Used to enforce invariants.
Example:
UNIQUE (user_id, date)
Meaning:
- a user can have only one record per date.
Schema enforces business logic.
This is Staff-level thinking.
SECTION 5 — SHARDING (THE REAL VERSION)
Most people misunderstand sharding.
Sharding solves write scaling, NOT read scaling.
Reads scale via:
-
caching
-
replicas
-
materialized views
Sharding is for:
-
massive write throughput
-
extremely large datasets
-
distributing ownership
-
reducing hot rows
Shard Key Rules
A good shard key:
-
evenly distributes load
-
minimizes cross-shard queries
-
never requires rebalancing
Examples:
-
user_id(great for social apps) -
order_id(perfect for e-commerce) -
hashed values (even distribution)
-
geo-based keys (location apps)
A BAD shard key:
-
timestamp
-
random UUID
-
boolean flags
Staff Engineers ALWAYS choose shard keys based on access patterns.
SECTION 6 — CONSISTENCY MODELS (PRACTICAL VERSION)
Forget abstract theory — here’s how consistency works in real systems.
1. Strong Consistency
Guarantees:
-
reads reflect latest writes
-
no stale data
Used for:
-
payments
-
authentication
-
state transitions
Examples:
-
PostgreSQL
-
MySQL
-
Spanner
2. Eventual Consistency
Guarantees:
-
system converges to correctness
-
temporary stale reads allowed
Used for:
-
feeds
-
analytics
-
notifications
-
search indexing
Examples:
-
DynamoDB
-
Cassandra
-
ElasticSearch
3. Causal Consistency
Reads follow the order of writes as seen by the client.
Used for:
-
chat apps
-
collaborative tools
-
messaging
SECTION 7 — DENORMALIZATION (THE STAFF ENGINEER APPROACH)
Average engineers say:
“Normalize everything.”
Staff engineers say:
“Normalize for correctness, denormalize for performance.”
Rules of safe denormalization:
-
Only denormalize immutable facts
-
Store computed fields (e.g., total_amount)
-
Use triggers/jobs to keep data consistent
-
Denormalize when read patterns demand it
Examples:
-
storing user_name inside order table
-
storing monthly summary inside analytics tables
-
storing last_message inside conversation table
SECTION 8 — EVENT SOURCING (DEEP EXPLANATION)
Event sourcing stores changes, not current state.
Example event stream:
Created
ItemAdded
ItemAdded
ItemRemoved
CheckedOut
Paid
Replaying events → reconstruct current state.
Benefits:
-
perfect audit log
-
time-travel debugging
-
strong consistency
-
ideal for financial systems
Costs:
-
complexity
-
replay overhead
-
strong schema discipline required
Not every system needs event sourcing — but knowing it is Staff-level knowledge.
SECTION 9 — CAP THEOREM (REAL WORLD VERSION)
CAP says you must choose 2 out of 3 during a network partition:
-
Consistency
-
Availability
-
Partition Tolerance
BUT in the real world:
-
Distributed systems must always tolerate partitions
-
So the real choice is:
Consistency vs Availability
CP systems:
-
payments
-
banking
-
auth
-
state machines
AP systems:
-
messaging
-
notifications
-
search
-
feeds
Knowing the correct category is a Staff-level skill.