THE PRIME DIRECTIVE OF DATA DESIGN

“Data outlives code.”

This is one of the first rules you learn at companies like Google and Stripe.

Code can be rewritten.
Services can be replaced.
Frameworks go out of style.
APIs change.

But data structures are permanent.

A bad schema haunts a company for years.

This is why Staff Engineers focus on data modeling FIRST, not API design or microservices.

SQL vs NoSQL: THINK IN TERMS OF DATA SHAPE

Most engineers choose databases based on trends or "what the company uses."

Top engineers choose based on data shape.

Use SQL when data is:

relational
structured
transactional
highly consistent
joins are natural
strong constraints exist (foreign keys, unique keys, checks)

Examples:

fintech
orders/payments
healthcare
authentication
booking flows

Use NoSQL when data is:

semi-structured
nested
document-oriented
variable in shape
primarily key-value access
high-volume reads
low-latency

Examples:

session storage
activity logs
user preferences
chat messages
product catalog
caching layer

RULE OF STAFF ENGINEERS:

If relationships drive your logic → SQL is almost always better.
If documents drive your logic → NoSQL is often better.

DATA MODELING MENTAL MODELS

Here are the models used by architects at Stripe, Shopify, Airbnb.

Model 1 — Entities & Relationships

Every system has:

Entities (User, Order, Message, Pet, CallSession)
Relationships (1–1, 1–many, many–many)

Understanding these defines your tables/collections.

Model 2 — Access Patterns First

This is critical:

Design for how the data will be READ and WRITTEN.
Not how you think the data “should look.”

Examples:

If you often need user + profile + settings → denormalize into one table or add a view.
If you frequently need order + items → model it together and index foreign keys.

Access patterns determine:

indexing
denormalization
table layout
caching strategy

Model 3 — State Machines → Data Models

Modeling flows as state machines gives you:

state: ENUM
updated_at
transition_reason
previous_state

This makes debugging and analytics easy.

Examples:

Payment: pending → authorized → captured → refunded
Enrollment: draft → submitted → review → approved → completed
Video Call: idle → ringing → connected → ended

State-first modeling removes ambiguity.

Model 4 — Invariants

The most important part of data modeling.

Examples:

A user cannot have two active carts.
A payment cannot be modified after capture.
A pet must belong to exactly one owner.
A call session must always have a unique channel ID.

Invariants → constraints → schema design.

Model 5 — Data Ownership

Identify which service owns which data.

Example:

Auth owns User Credentials
Profile service owns display info
Billing owns balance + payments

This prevents:

duplication
inconsistency
cross-service entanglement

INDEXING MASTERY

This is where Senior → Staff transitions happen.

Indexing can turn a 5s query → 5ms.

Understanding indexing is Staff Engineer territory.

Index Type 1 — Single Column Index

Example:

INDEX (email)

Use for:

unique lookup
high cardinality fields
login queries

Index Type 2 — Composite Index

Order matters.

INDEX (user_id, created_at)

Query must match prefix ordering.

Perfect for:

filtering + sorting
pagination
dashboard queries

Index Type 3 — Covering Index

Index contains all needed fields → query never touches the table.

Huge performance boost.

Index Type 4 — Partial Index

Example:

WHERE status = 'active'

Used for:

hot-path queries
sparse data
filtering heavy workloads

Index Type 5 — Unique Index

Used to enforce invariants.

Example:

UNIQUE (user_id, date)

Meaning:

a user can have only one record per date.

Schema enforces business logic.

This is Staff-level thinking.

SHARDING (THE REAL VERSION)

Most people misunderstand sharding.

Sharding solves write scaling, NOT read scaling.

Reads scale via:

caching
replicas
materialized views

Sharding is for:

massive write throughput
extremely large datasets
distributing ownership
reducing hot rows

Shard Key Rules

A good shard key:

evenly distributes load
minimizes cross-shard queries
never requires rebalancing

Examples:

user_id (great for social apps)
order_id (perfect for e-commerce)
hashed values (even distribution)
geo-based keys (location apps)

A BAD shard key:

timestamp
random UUID
boolean flags

Staff Engineers ALWAYS choose shard keys based on access patterns.

CONSISTENCY MODELS (PRACTICAL VERSION)

Forget abstract theory — here’s how consistency works in real systems.

1. Strong Consistency

Guarantees:

reads reflect latest writes
no stale data

Used for:

payments
authentication
state transitions

Examples:

PostgreSQL
MySQL
Spanner

2. Eventual Consistency

Guarantees:

system converges to correctness
temporary stale reads allowed

Used for:

feeds
analytics
notifications
search indexing

Examples:

DynamoDB
Cassandra
ElasticSearch

3. Causal Consistency

Reads follow the order of writes as seen by the client.

Used for:

chat apps
collaborative tools
messaging

DENORMALIZATION (THE STAFF ENGINEER APPROACH)

Average engineers say:

“Normalize everything.”

Staff engineers say:

“Normalize for correctness, denormalize for performance.”

Rules of safe denormalization:

Only denormalize immutable facts
Store computed fields (e.g., total_amount)
Use triggers/jobs to keep data consistent
Denormalize when read patterns demand it

Examples:

storing user_name inside order table
storing monthly summary inside analytics tables
storing last_message inside conversation table

EVENT SOURCING (DEEP EXPLANATION)

Event sourcing stores changes, not current state.

Example event stream:

Created
ItemAdded
ItemAdded
ItemRemoved
CheckedOut
Paid

Replaying events → reconstruct current state.

Benefits:

perfect audit log
time-travel debugging
strong consistency
ideal for financial systems

Costs:

complexity
replay overhead
strong schema discipline required

Not every system needs event sourcing — but knowing it is Staff-level knowledge.

CAP THEOREM (REAL WORLD VERSION)

CAP says you must choose 2 out of 3 during a network partition:

Consistency
Availability
Partition Tolerance

BUT in the real world:

Distributed systems must always tolerate partitions
So the real choice is:

Consistency vs Availability

CP systems:

payments
banking
auth
state machines

AP systems:

messaging
notifications
search
feeds

Knowing the correct category is a Staff-level skill.

SQL vs NoSQL: THINK IN TERMS OF DATA SHAPE

Use SQL when data is:​

Use NoSQL when data is:​

RULE OF STAFF ENGINEERS:​

DATA MODELING MENTAL MODELS

Model 1 — Entities & Relationships​

Model 2 — Access Patterns First​

Model 3 — State Machines → Data Models​

Model 4 — Invariants​

Model 5 — Data Ownership​

INDEXING MASTERY

Index Type 1 — Single Column Index​

Index Type 2 — Composite Index​

Index Type 3 — Covering Index​

Index Type 4 — Partial Index​

Index Type 5 — Unique Index​

SHARDING (THE REAL VERSION)

Shard Key Rules​

CONSISTENCY MODELS (PRACTICAL VERSION)

1. Strong Consistency​

2. Eventual Consistency​

3. Causal Consistency​

DENORMALIZATION (THE STAFF ENGINEER APPROACH)

EVENT SOURCING (DEEP EXPLANATION)

CAP THEOREM (REAL WORLD VERSION)

Use SQL when data is:

Use NoSQL when data is:

RULE OF STAFF ENGINEERS:

Model 1 — Entities & Relationships

Model 2 — Access Patterns First

Model 3 — State Machines → Data Models

Model 4 — Invariants

Model 5 — Data Ownership

Index Type 1 — Single Column Index

Index Type 2 — Composite Index

Index Type 3 — Covering Index

Index Type 4 — Partial Index

Index Type 5 — Unique Index

Shard Key Rules

1. Strong Consistency

2. Eventual Consistency

3. Causal Consistency