BUILDING TOOLS FOR ENGINEERS
The best engineers don't just write features.
They build platforms that make entire teams faster.
Platform Engineering is the art of:
-
Creating internal tools
-
Reducing cognitive load
-
Eliminating repetitive work
-
Building "golden paths"
-
Scaling engineering effectiveness
This is how you become force multiplier and reach Staff+ levels.
SECTION 1 — WHAT IS DEVELOPER EXPERIENCE?
Developer Experience (DX) Definition
DX = How easy and pleasant it is for developers to:
- Onboard to codebase
- Build features
- Deploy code
- Debug issues
- Collaborate with team
Great DX = 10x productivity
The DX Stack
┌──────────────────────────────────┐
│ Layer 5: Documentation │
├──────────────────────────────────┤
│ Layer 4: Tools & CLI │
├──────────────────────────────────┤
│ Layer 3: CI/CD Pipeline │
├──────────────────────────────────┤
│ Layer 2: Local Dev Environment │
├──────────────────────────────────┤
│ Layer 1: Code Architecture │
└──────────────────────────────────┘
Each layer impacts developer velocity.
SECTION 2 — THE GOLDEN PATH PRINCIPLE
What is a Golden Path?
A Golden Path is the easiest, most supported way to accomplish common tasks.
Example: Deploying a New Service
Without Golden Path:
# Engineer must figure out:
1. How to create service repo?
2. What dependencies to use?
3. How to set up CI/CD?
4. How to configure monitoring?
5. How to deploy to production?
6. How to handle secrets?
7. How to set up alerting?
Time to first deploy: 2-4 weeks
With Golden Path:
# One command:
$ platform create-service --name=my-api --type=nodejs
✓ Created repo from template
✓ Set up CI/CD pipeline
✓ Configured monitoring
✓ Added to service mesh
✓ Generated API documentation
✓ Ready to deploy
Time to first deploy: 1 hour
Result: 40x faster onboarding
Golden Paths for Common Tasks
1. Creating a New Service
# CLI tool
$ platform new service
? Service name: user-api
? Language: TypeScript
? Database: PostgreSQL
? Queue: Yes (RabbitMQ)
Creating service...
✓ Generated from template
✓ Dependencies installed
✓ Database migrations set up
✓ Tests scaffolded
✓ CI/CD configured
✓ Monitoring added
✓ Documentation created
Next steps:
cd user-api
npm run dev
npm test
2. Deploying to Production
# Single command deployment
$ platform deploy
? Environment: production
? Confirm deployment to prod? Yes
Running pre-deployment checks...
✓ Tests passing
✓ Linting passed
✓ Security scan clean
✓ Database migrations ready
Deploying...
✓ Building Docker image
✓ Running canary deployment (5%)
✓ Health checks passed
✓ Scaling to 50%
✓ Scaling to 100%
✓ Deployment complete
Deployed to: <https://user-api.prod.company.com>
Monitoring: <https://grafana.company.com/d/user-api>
3. Creating a New Database
$ platform db create
? Database name: analytics_db
? Type: PostgreSQL
? Size: Small (10GB)
? Backups: Daily
Creating database...
✓ Provisioned in us-east-1
✓ Configured replication
✓ Set up automated backups
✓ Added monitoring
✓ Generated connection strings
Connection string (saved to secrets):
postgresql://analytics_db:***@db.internal:5432/analytics
Add to your .env:
DATABASE_URL=secrets://analytics_db
SECTION 3 — BUILDING INTERNAL TOOLS
The Internal Tool Hierarchy
┌────────────────────────────────┐
│ Level 4: Platform CLI │ → Orchestrates everything
├────────────────────────────────┤
│ Level 3: Developer Portal │ → Self-service UI
├────────────────────────────────┤
│ Level 2: Automation Scripts │ → Reduce repetition
├────────────────────────────────┤
│ Level 1: Documentation │ → Prevent questions
└────────────────────────────────┘
Level 1: Living Documentation
Problem: Docs are always outdated
Solution: Documentation as Code
# service-template/README.md
# {{SERVICE_NAME}}
## Quick Start
```bash
npm install
npm run dev
Environment Variables
{{#each env_vars}}
{{name}}: {{description}} {{#if required}}(required){{/if}}
{{/each}}
API Endpoints
{{#each endpoints}}
{{method}} {{path}}
{{description}}
Request:
{{request_example}}
Response:
{{response_example}}
{{/each}}
Deployment
platform deploy --env={{environment}}
Auto-generated from code. Last updated: {{timestamp}}
**Generated automatically from:**
- Code annotations
- API schemas
- Config files
**Never out of date.**
---
### **Documentation Portal**
Internal docs site:
├── Getting Started
│ ├── New Engineer Onboarding
│ ├── Setting Up Local Environment
│ └── Deploying Your First Service
│
├── Architecture
│ ├── System Overview (auto-generated diagram)
│ ├── Service Catalog (auto-discovered)
│ └── Data Flow (live)
│
├── How-To Guides
│ ├── Add a New Microservice
│ ├── Set Up Database
│ ├── Configure Monitoring
│ └── Debug Production Issues
│
└── Reference
├── API Documentation (auto-generated)
├── CLI Commands
└── Best Practices
---
## **Level 2: Automation Scripts**
### **Common Engineering Tasks → Scripts**
#### **Script 1: Environment Setup**
```bash
#!/bin/bash
# setup-dev.sh
echo "🚀 Setting up development environment..."
# Install dependencies
if ! command -v docker &> /dev/null; then
echo "Installing Docker..."
curl -fsSL <https://get.docker.com> | sh
fi
# Clone all services
echo "Cloning repositories..."
for service in api frontend worker; do
git clone git@github.com:company/$service.git
done
# Set up databases
echo "Starting databases..."
docker-compose up -d postgres redis
# Install dependencies
echo "Installing dependencies..."
for service in */; do
cd "$service"
if [ -f package.json ]; then
npm install
fi
cd ..
done
# Create .env files
echo "Creating .env files from templates..."
for service in */; do
if [ -f "$service/.env.example" ]; then
cp "$service/.env.example" "$service/.env"
fi
done
echo "✅ Development environment ready!"
echo "Run 'npm run dev' in any service to start."
Time saved: 2 hours → 5 minutes
Script 2: Database Migration Helper
#!/bin/bash
# db-migrate.sh
SERVICE=$1
DIRECTION=${2:-up}
if [ -z "$SERVICE" ]; then
echo "Usage: db-migrate.sh <service> [up|down]"
exit 1
fi
echo "Running migrations for $SERVICE ($DIRECTION)..."
# Backup database first
echo "Creating backup..."
pg_dump $DATABASE_URL > "backup-$(date +%Y%m%d-%H%M%S).sql"
# Run migrations
cd services/$SERVICE
npm run db:migrate:$DIRECTION
# Verify
echo "Verifying schema..."
npm run db:validate
echo "✅ Migration complete"
Script 3: Test Data Generator
#!/bin/bash
# seed-test-data.sh
echo "Generating test data..."
# Create test users
curl -X POST localhost:3000/api/users \\
-H "Content-Type: application/json" \\
-d '{
"email": "test@example.com",
"name": "Test User"
}'
# Create sample products
for i in {1..100}; do
curl -X POST localhost:3000/api/products \\
-H "Content-Type: application/json" \\
-d "{
\\"name\\": \\"Product $i\\",
\\"price\\": $(( RANDOM % 100 + 1 ))
}"
done
echo "✅ Test data created"
Level 3: Developer Portal (Self-Service)
What is a Developer Portal?
A web UI where engineers can:
-
Provision resources
-
Deploy services
-
View system status
-
Access documentation
-
Manage secrets
-
View logs & metrics
Example: Backstage (Spotify)
Home Dashboard:
├── My Services
│ ├── user-api (healthy)
│ ├── payment-service (deploying...)
│ └── worker-queue (degraded)
│
├── Quick Actions
│ ├── [Create New Service]
│ ├── [Deploy to Production]
│ └── [View Recent Incidents]
│
├── System Health
│ └── 99.8% uptime (last 7 days)
│
└── Recent Deployments
├── user-api v1.2.3 (2h ago) ✓
└── frontend v2.0.1 (5h ago) ✓
Service Creation Flow (Self-Service)
Developer Portal UI:
┌────────────────────────────────────┐
│ Create New Service │
├────────────────────────────────────┤
│ Service Name: [payment-api] │
│ Team: [Platform] │
│ Language: ○ Node.js ● Python │
│ Database: ☑ PostgreSQL ☐ MongoDB │
│ Cache: ☑ Redis ☐ Memcached │
│ Queue: ☑ RabbitMQ ☐ None │
│ │
│ [Cancel] [Create Service] │
└────────────────────────────────────┘
(Clicks Create)
Progress:
✓ Creating repository
✓ Generating code from template
✓ Setting up CI/CD
✓ Provisioning database
✓ Configuring monitoring
✓ Adding to service catalog
✓ Notifying team
Done! View your service: [payment-api dashboard]
Level 4: Platform CLI
The Ultimate Developer Tool
# platform CLI - unified interface for everything
$ platform help
Platform CLI v2.0
Usage: platform <command> [options]
Commands:
service Manage services
deploy Deploy applications
db Database operations
logs View logs
secrets Manage secrets
config Configuration management
status Check system status
Run 'platform <command> --help' for more info
CLI Commands Examples
Service Management
# Create service
$ platform service create --name=api --template=nodejs-api
# List services
$ platform service list
# View service details
$ platform service info api
# Delete service
$ platform service delete api
Deployment
# Deploy to staging
$ platform deploy --env=staging
# Deploy with canary
$ platform deploy --env=prod --canary=10
# Rollback
$ platform deploy rollback --env=prod
# View deployment history
$ platform deploy history
Database
# Create database
$ platform db create --name=analytics --type=postgres
# Run migration
$ platform db migrate --service=api
# Create backup
$ platform db backup --name=analytics
# Restore backup
$ platform db restore --backup=analytics-20240115
Logs
# Tail logs
$ platform logs --service=api --follow
# Search logs
$ platform logs --service=api --search="error" --since=1h
# Download logs
$ platform logs --service=api --since=1d > logs.txt
Secrets
# Set secret
$ platform secrets set API_KEY=abc123 --service=api
# List secrets
$ platform secrets list --service=api
# Rotate secret
$ platform secrets rotate DATABASE_PASSWORD --service=api
SECTION 4 — REDUCING COGNITIVE LOAD
The Cognitive Load Formula
Cognitive Load = (Decisions × Complexity) ÷ Automation
Goal: Minimize decisions, simplify complexity, maximize automation
Strategy 1: Sensible Defaults
Bad: Too many decisions
# service-config.yml
database:
host: ?
port: ?
pool_size: ?
timeout: ?
retry_attempts: ?
ssl: ?
logging: ?
cache:
host: ?
port: ?
ttl: ?
Engineer must figure out 10+ config values.
Good: Smart defaults
# service-config.yml (minimal)
database:
name: user_db # That's it!
# Platform provides defaults:
# - host: Discovered from service mesh
# - port: 5432 (standard)
# - pool_size: 10 (sensible default)
# - timeout: 30s (battle-tested)
# - retry_attempts: 3
# - ssl: true (always)
# - logging: error (in prod)
# Override only if needed:
database:
name: user_db
pool_size: 20 # Only override if you need to
Decisions reduced from 10 to 1.
Strategy 2: Convention Over Configuration
Example: Project Structure
# Enforced project structure
service/
├── src/
│ ├── api/ # API endpoints (auto-discovered)
│ ├── models/ # Data models (auto-migrated)
│ ├── services/ # Business logic
│ └── utils/ # Utilities
├── tests/ # Tests (auto-run in CI)
├── migrations/ # DB migrations (auto-applied)
└── config/
├── dev.yml # Dev config
└── prod.yml # Prod config
# Engineers don't decide structure
# They follow convention
# Tools work automatically
Strategy 3: Hide Complexity
Example: Deployment Complexity
Behind the scenes:
1. Build Docker image
2. Push to registry
3. Update Kubernetes manifests
4. Apply rolling update
5. Run health checks
6. Monitor rollout
7. Alert on errors
8. Rollback if needed
Engineer sees:
$ platform deploy
✓ Deployed successfully
All complexity hidden.
SECTION 5 — BUILDING A PLATFORM CLI
Architecture
CLI (Commander.js)
↓
API Client (axios)
↓
Platform API (internal)
↓
Orchestration Layer
↓
├── Kubernetes API
├── Database Provisioning
├── Secret Management
├── Monitoring Setup
└── CI/CD Integration
Implementation Example
1. CLI Entry Point
// src/cli.ts
#!/usr/bin/env node
import { Command } from 'commander';
import { serviceCommands } from './commands/service';
import { deployCommands } from './commands/deploy';
import { dbCommands } from './commands/db';
const program = new Command();
program
.name('platform')
.description('Platform Engineering CLI')
.version('2.0.0');
// Register command groups
serviceCommands(program);
deployCommands(program);
dbCommands(program);
program.parse();
2. Service Commands
// src/commands/service.ts
import { Command } from 'commander';
import { PlatformClient } from '../client';
export function serviceCommands(program: Command) {
const service = program.command('service');
service
.command('create')
.description('Create a new service')
.requiredOption('--name <name>', 'Service name')
.option('--template <template>', 'Template to use', 'nodejs-api')
.action(async (options) => {
console.log('🚀 Creating service...');
const client = new PlatformClient();
const result = await client.createService({
name: options.name,
template: options.template
});
console.log('✓ Service created');
console.log(`Repository: ${result.repoUrl}`);
console.log(`Dashboard: ${result.dashboardUrl}`);
});
service
.command('list')
.description('List all services')
.action(async () => {
const client = new PlatformClient();
const services = await client.listServices();
console.table(services.map(s => ({
Name: s.name,
Status: s.status,
Team: s.team,
'Last Deploy': s.lastDeploy
})));
});
}
3. Platform Client
// src/client.ts
import axios from 'axios';
export class PlatformClient {
private api = axios.create({
baseURL: process.env.PLATFORM_API || '<https://platform.internal>',
headers: {
'Authorization': `Bearer ${this.getToken()}`
}
});
private getToken(): string {
// Load from ~/.platform/credentials
return process.env.PLATFORM_TOKEN || '';
}
async createService(options: {
name: string;
template: string;
}) {
const { data } = await this.api.post('/services', options);
return data;
}
async listServices() {
const { data } = await this.api.get('/services');
return data;
}
async deployService(name: string, env: string) {
const { data } = await this.api.post(`/services/${name}/deploy`, {
environment: env
});
return data;
}
}
4. Interactive Prompts
// src/commands/deploy.ts
import { Command } from 'commander';
import { select, confirm } from '@inquirer/prompts';
export function deployCommands(program: Command) {
program
.command('deploy')
.description('Deploy service')
.action(async () => {
// Interactive environment selection
const env = await select({
message: 'Select environment:',
choices: [
{ name: 'Development', value: 'dev' },
{ name: 'Staging', value: 'staging' },
{ name: 'Production', value: 'prod' }
]
});
// Confirmation for production
if (env === 'prod') {
const confirmed = await confirm({
message: 'Deploy to PRODUCTION?',
default: false
});
if (!confirmed) {
console.log('Deployment cancelled');
return;
}
}
// Deploy
console.log('🚀 Deploying...');
const client = new PlatformClient();
await client.deployService('current-service', env);
console.log('✓ Deployed successfully');
});
}
Advanced Features
1. Progress Bars
import ora from 'ora';
async function deployService() {
const spinner = ora('Building Docker image...').start();
await buildImage();
spinner.succeed('Docker image built');
spinner.start('Pushing to registry...');
await pushImage();
spinner.succeed('Pushed to registry');
spinner.start('Deploying to Kubernetes...');
await deployToK8s();
spinner.succeed('Deployed successfully');
}
2. Rich Output
import chalk from 'chalk';
import Table from 'cli-table3';
function displayServices(services: Service[]) {
const table = new Table({
head: ['Name', 'Status', 'Team', 'Last Deploy'],
style: { head: ['cyan'] }
});
services.forEach(s => {
const status = s.status === 'healthy'
? chalk.green('● Healthy')
: chalk.red('● Degraded');
table.push([
chalk.bold(s.name),
status,
s.team,
s.lastDeploy
]);
});
console.log(table.toString());
}
3. Error Handling
async function handleCommand<T>(
fn: () => Promise<T>
): Promise<T | void> {
try {
return await fn();
} catch (error) {
if (error.response?.status === 401) {
console.error(chalk.red('Authentication failed'));
console.log('Run: platform login');
} else if (error.response?.status === 403) {
console.error(chalk.red('Permission denied'));
} else {
console.error(chalk.red('Error:'), error.message);
console.log('Run with --debug for more details');
}
process.exit(1);
}
}
SECTION 6 — METRICS & MEASURING DX
DX Metrics That Matter
1. Time to First Deploy
Measure: Time from "git clone" to "deployed to prod"
Bad: 2-4 weeks
Good: 1-2 days
Great: < 4 hours
Track: For every new engineer
2. Build Time
Measure: Time from "git push" to "deployed"
Bad: 30-60 minutes
Good: 10-15 minutes
Great: < 5 minutes
Optimize:
- Caching
- Parallel builds
- Incremental builds
3. Deploy Frequency
Measure: Deployments per day
Bad: Weekly
Good: Daily
Great: 10+ per day
Enable:
- Automated testing
- Fast CI/CD
- Confidence in rollbacks
4. Mean Time to Recovery (MTTR)
Measure: Time from "incident detected" to "resolved"
Bad: Hours
Good: < 1 hour
Great: < 15 minutes
Improve:
- Fast rollbacks
- Good monitoring
- Clear runbooks
5. Developer Satisfaction
Measure: Quarterly survey
Questions:
1. How easy is it to build features? (1-10)
2. How confident are you in deploys? (1-10)
3. How easy is debugging? (1-10)
4. How good are internal tools? (1-10)
Target: 8+ average
Tracking Dashboard
Platform Health Dashboard:
┌─────────────────────────────────────┐
│ Developer Experience Metrics │
├─────────────────────────────────────┤
│ Time to First Deploy: 3.2 hours │
│ Build Time (p50): 8 minutes │
│ Deploy Frequency: 15/day │
│ MTTR: 12 minutes │
│ Developer Satisfaction: 8.4/10 │
├─────────────────────────────────────┤
│ Recent Improvements: │
│ ✓ Reduced build time by 40% │
│ ✓ Increased deploy frequency 2x │
│ ✓ New CLI tool adoption: 85% │
└─────────────────────────────────────┘
SECTION 7 — PLATFORM ENGINEERING CAREER PATH
The Platform Engineer Role
Platform Engineers build tools for other engineers.
Responsibilities:
- Build internal platforms
- Improve developer experience
- Create self-service tools
- Reduce cognitive load
- Scale engineering org
Impact:
- 100 engineers × 10% faster = 10 engineers worth of value
- 1000 engineers × 5% faster = 50 engineers worth of value
This is why Platform Engineers are highly paid.
Career Progression
Junior → Mid → Senior → Staff Platform Engineer
Junior:
- Fix bugs in internal tools
- Write scripts
- Improve documentation
Mid:
- Build new internal tools
- Own small platforms
- Gather requirements from engineers
Senior:
- Design platform architecture
- Lead platform initiatives
- Set platform strategy
- Influence engineering culture
Staff:
- Company-wide platform vision
- Multi-year roadmap
- Cross-org impact
- Engineering effectiveness strategy
Skills to Develop
Technical:
- Backend engineering (APIs)
- Infrastructure (Docker, K8s)
- CI/CD systems
- Databases
- Monitoring & observability
- CLI development
- Web development (portals)
Soft Skills:
- Empathy for developers
- Product thinking
- Communication
- Stakeholder management
- Teaching & documentation
Conclusion
Platform Engineering is about leverage.
One great internal tool can:
-
Save thousands of engineering hours
-
Accelerate entire organization
-
Reduce cognitive load
-
Improve developer happiness
Top 1% engineers understand:
-
Build tools, not just features
-
Automate repetition
-
Create golden paths
-
Reduce decisions
-
Measure developer experience
This is how you become indispensable.
This completes PART XI (b) — Developer Experience & Platform Engineering.
Build the tools that make engineers 10x faster.