System Design Interview: Complete Preparation Guide for Software Engineers
Master system design interviews with this comprehensive guide. Learn scalability patterns, load balancing, caching, database sharding, microservices, and how to approach any system design problem methodically.
Introduction
System design interviews test your ability to design large-scale distributed systems. After conducting and giving dozens of system design interviews, I've compiled this comprehensive guide covering everything you need to know.
The Framework: How to Approach Any Problem
Step 1: Clarify Requirements (5 minutes)
Questions to Ask:
├── Functional Requirements
│ ├── What are the core features?
│ ├── Who are the users?
│ └── What are the use cases?
├── Non-Functional Requirements
│ ├── Scale: How many users? How much data?
│ ├── Performance: Latency requirements?
│ ├── Availability: 99.9%? 99.99%?
│ └── Consistency: Strong or eventual?
└── Constraints
├── Budget/cost considerations
├── Timeline constraints
└── Existing infrastructure
Step 2: Estimate Scale (5 minutes)
Back-of-envelope calculations example: Twitter-like service
Users
total_users = 500_000_000 # 500M users
daily_active_users = 200_000_000 # 200M DAUTweets per day
tweets_per_user_per_day = 2
total_tweets_per_day = daily_active_users * tweets_per_user_per_day
= 400M tweets/day
Read:Write ratio
reads_per_user_per_day = 100 # timeline views
total_reads_per_day = daily_active_users * reads_per_user_per_day
= 20B reads/day
QPS (Queries Per Second)
read_qps = 20_000_000_000 / 86400 # ≈ 230K QPS
write_qps = 400_000_000 / 86400 # ≈ 4.6K QPSStorage
avg_tweet_size = 500 # bytes (text + metadata)
daily_storage = total_tweets_per_day * avg_tweet_size
= 200GB/day, ~73TB/year
Bandwidth
incoming = write_qps * avg_tweet_size # ≈ 2.3 MB/s
outgoing = read_qps * avg_tweet_size # ≈ 115 MB/s
Step 3: High-Level Design (10 minutes)
Start with a simple design, then evolve:[Users] → [Load Balancer] → [API Servers] → [Database]
Then add components as needed:
- Caching layer
- Message queues
- Search service
- CDN
- etc.
Step 4: Deep Dive (15-20 minutes)
Focus on 2-3 components in detail based on interviewer interest.
Step 5: Address Bottlenecks (5 minutes)
Identify single points of failure and scalability limits.
Core Concepts
Scalability Patterns
Vertical Scaling (Scale Up)
├── Add more CPU, RAM, storage
├── Simpler but has limits
└── Single point of failureHorizontal Scaling (Scale Out)
├── Add more machines
├── Requires distributed architecture
└── Better for large scale
Load Balancing
┌─────────────┐
│ Client │
└──────┬──────┘
│
┌──────▼──────┐
│Load Balancer│
└──────┬──────┘
┌───────────────┼───────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Server 1 │ │ Server 2 │ │ Server 3 │
└─────────────┘ └─────────────┘ └─────────────┘Load Balancing Algorithms:
1. Round Robin - Simple rotation
2. Weighted Round Robin - Based on server capacity
3. Least Connections - Route to least busy server
4. IP Hash - Consistent routing for same client
5. Least Response Time - Route to fastest server
Caching Strategies
Cache Locations:
┌─────────────────────────────────────────────────┐
│ CDN Cache │
│ (Static assets, edge) │
└─────────────────────┬───────────────────────────┘
│
┌─────────────────────▼───────────────────────────┐
│ Application Cache │
│ (Redis/Memcached cluster) │
└─────────────────────┬───────────────────────────┘
│
┌─────────────────────▼───────────────────────────┐
│ Database Cache │
│ (Query cache, buffer pool) │
└─────────────────────────────────────────────────┘Caching Patterns:
1. Cache-Aside: App manages cache
2. Write-Through: Write to cache and DB
3. Write-Behind: Async DB writes
4. Read-Through: Cache fetches from DB
Database Scaling
Read Replicas:
┌──────────────┐
Writes │ Master │
──────────► │ Database │
└──────┬───────┘
│ Replication
┌────────────┼────────────┐
│ │ │
┌──────▼─────┐ ┌────▼─────┐ ┌────▼─────┐
│ Replica 1 │ │ Replica 2│ │ Replica 3│
└──────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
Reads ◄─────────────┴────────────┘Sharding (Horizontal Partitioning):
┌─────────────────────────────────────────────────┐
│ Shard Router │
└───────────────────────┬─────────────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ Shard 0 │ │ Shard 1 │ │ Shard 2 │
│ A-H │ │ I-P │ │ Q-Z │
└─────────┘ └─────────┘ └─────────┘
Sharding Strategies:
1. Range-based: user_id 1-1M in shard 1
2. Hash-based: hash(user_id) % num_shards
3. Directory-based: Lookup table for shard mapping
Message Queues
Producer → [Queue] → ConsumerBenefits:
1. Decoupling: Services don't need to know each other
2. Async Processing: Handle spikes gracefully
3. Reliability: Messages persist until processed
4. Scalability: Add more consumers as needed
Use Cases:
- Email sending
- Image processing
- Analytics events
- Order processing
CAP Theorem
Consistency
/\
/ \
/ \
/ Pick \
/ Two \
/──────────\
Availability ──── Partition ToleranceIn distributed systems, during network partition:
- CP: Sacrifice availability (return error)
- AP: Sacrifice consistency (return stale data)
Real-world choices:
- Banking: CP (consistency critical)
- Social media: AP (availability > perfect consistency)
Common System Design Problems
1. URL Shortener (Easy)
Requirements:
- Shorten long URLs
- Redirect short URLs to original
- Analytics (optional)
Scale:
- 100M URLs created/month
- 10:1 read/write ratio
Design:
┌──────────┐ ┌─────────────┐ ┌──────────┐
│ Client │────►│ API │────►│ Cache │
└──────────┘ │ Server │ │ (Redis) │
└──────┬──────┘ └────┬─────┘
│ │
└────────┬────────┘
│
┌──────▼──────┐
│ Database │
│ (NoSQL) │
└─────────────┘Key Decisions:
- Short URL generation: Base62 encoding or hash
- Storage: NoSQL for simplicity
- Caching: Heavy read traffic
2. Twitter/Social Feed (Medium)
Requirements:
- Post tweets
- Follow users
- View home timeline
- Search tweets
Scale:
- 500M users, 200M DAU
- 400M tweets/day
- Timeline must load < 200ms
Design:
┌─────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────┐ ┌────────────────┐ ┌─────────────────┐ │
│ │ Users │───►│ Load Balancer │───►│ API Gateway │ │
│ └─────────┘ └────────────────┘ └────────┬────────┘ │
│ │ │
│ ┌────────────────────────────────────────┼──────┐ │
│ │ │ │ │ │
│ ┌─────▼─────┐ ┌────────▼───────┐ ┌──────▼────┐ │ │
│ │ Tweet │ │ Timeline │ │ User │ │ │
│ │ Service │ │ Service │ │ Service │ │ │
│ └─────┬─────┘ └────────┬───────┘ └───────────┘ │ │
│ │ │ │ │
│ ┌─────▼─────┐ ┌────────▼───────┐ │ │
│ │ Tweet │ │ Timeline │ │ │
│ │ DB │ │ Cache (Fan- │ │ │
│ └───────────┘ │ out on write)│ │ │
│ └────────────────┘ │ │
└───────────────────────────────────────────────────────────┘Fan-out Strategies:
- Fan-out on Write: Pre-compute timelines (for normal users)
- Fan-out on Read: Compute at read time (for celebrities)
- Hybrid: Combination based on follower count
3. Chat System (Medium)
Requirements:
- 1:1 messaging
- Group chats
- Online status
- Message history
Key Challenges:
- Real-time delivery
- Message ordering
- Offline message handling
Design:
┌──────────────────────────────────────────────────────────┐
│ │
│ ┌─────────┐ ┌────────────────┐ ┌───────────────┐ │
│ │ Client │◄──►│ WebSocket │◄──►│ Connection │ │
│ │ App │ │ Gateway │ │ Manager │ │
│ └─────────┘ └───────┬────────┘ └───────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ │ │ │ │
│ ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐ │
│ │ Message │ │ Presence │ │ Group │ │
│ │ Service │ │ Service │ │ Service │ │
│ └──────┬──────┘ └──────┬──────┘ └─────────────┘ │
│ │ │ │
│ ┌──────▼──────┐ ┌──────▼──────┐ │
│ │ Message │ │ Redis │ │
│ │ Storage │ │(Online users│ │
│ │(Cassandra) │ │ status) │ │
│ └─────────────┘ └─────────────┘ │
└──────────────────────────────────────────────────────────┘Message Flow:
1. Client connects via WebSocket
2. Message sent to Message Service
3. Service looks up recipient's connection
4. If online: Forward via WebSocket
5. If offline: Store for later delivery
4. Rate Limiter (Easy-Medium)
Algorithms:1. Token Bucket:
- Tokens added at fixed rate
- Request consumes token
- No token = rejected
2. Sliding Window:
- Count requests in time window
- Window slides continuously
3. Fixed Window Counter:
- Simple counter per time window
- Can have edge case issues
Implementation (Redis + Lua):
import redis
import timeclass RateLimiter:
def __init__(self, redis_client: redis.Redis):
self.redis = redis_client
def is_allowed(self, user_id: str, limit: int, window: int) -> bool:
"""Sliding window rate limiter using Redis sorted sets"""
key = f"rate_limit:{user_id}"
now = time.time()
window_start = now - window
pipe = self.redis.pipeline()
# Remove old entries
pipe.zremrangebyscore(key, 0, window_start)
# Count current entries
pipe.zcard(key)
# Add current request
pipe.zadd(key, {str(now): now})
# Set expiry
pipe.expire(key, window)
results = pipe.execute()
request_count = results[1]
return request_count < limit
Key Technologies to Know
Databases
| Type | Examples | Best For | |------|----------|----------| | Relational | PostgreSQL, MySQL | ACID, complex queries | | Document | MongoDB, CouchDB | Flexible schema, JSON | | Key-Value | Redis, DynamoDB | Caching, sessions | | Wide-Column | Cassandra, HBase | Time-series, high write | | Graph | Neo4j, Neptune | Relationships, social |
Message Queues
| System | Strengths | Use Cases | |--------|-----------|-----------| | Kafka | High throughput, streaming | Event streaming, logs | | RabbitMQ | Flexible routing, AMQP | Task queues | | SQS | Managed, simple | AWS workloads | | Redis Pub/Sub | Low latency | Real-time features |
Caching
| System | Type | Best For | |--------|------|----------| | Redis | In-memory | Sessions, caching, queues | | Memcached | In-memory | Simple caching | | CDN | Edge | Static assets, API responses |
Interview Tips
Do's
1. Think out loud - Explain your reasoning 2. Start simple - Then add complexity 3. Ask questions - Clarify requirements 4. Draw diagrams - Visual communication 5. Consider trade-offs - No perfect solution 6. Discuss alternatives - Show breadth
Don'ts
1. Don't dive into details too fast 2. Don't ignore scale requirements 3. Don't forget about failure cases 4. Don't be silent - Keep communicating 5. Don't be defensive - Accept feedback
Practice Problems
Easy:
- URL Shortener
- Pastebin
- Rate Limiter
- Twitter/Social Feed
- Web Crawler
- Notification System
- Uber/Lyft
- YouTube
- Google Docs
- Distributed Cache
Key Takeaways
1. Follow a structured approach - Requirements → Scale → Design → Deep Dive 2. Know the building blocks - Load balancers, caches, queues, databases 3. Understand trade-offs - Every decision has pros and cons 4. Practice estimation - Back-of-envelope math is crucial 5. Think about failures - How does the system handle failures? 6. Consider operations - Monitoring, deployment, maintenance
Conclusion
System design interviews test your ability to think at scale and make sound architectural decisions. Practice with real-world problems, understand the core concepts, and always communicate your thought process. The goal isn't a perfect design—it's demonstrating systematic problem-solving.
---
Preparing for system design interviews? Connect on LinkedIn to discuss preparation strategies.
Related Articles
- Redis Caching Strategies Complete Guide - Implement caching in your systems
- PostgreSQL Performance Optimization - Database scaling techniques
- Docker & Kubernetes Deployment - Container orchestration
- Event-Driven Architecture with Kafka - Message queues and async processing
- AWS Services for Backend Developers - Cloud infrastructure essentials