Ch1 System Design Introduction: The Welcome Party - Your Journey into System Design Begins

System Design: Introduction

Intermediate

Part 1: Welcome to the Architect's Mindset

1.1. Introduction: You're Invited to the Party!

Welcome to the world of system design. If you've spent your career focused on coding, you've been perfecting the art of cooking delicious, intricate dishes. You can take a recipe—a set of well-defined requirements—and produce a flawless result. System design, however, is a different beast altogether. It's not about cooking one dish; it's about planning the entire party.

Imagine you're tasked with organizing a massive, multi-day festival. You're not just thinking about the food. You're considering the venue's capacity, the flow of guests, the placement of stages, the reliability of the power grid, the security at the gates, and how to handle a sudden downpour or a surprise influx of thousands of extra attendees. That, in essence, is system design.

Formally, system design is the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy a given set of requirements. It is the blueprint that guides the construction of everything from a simple mobile app to a globe-spanning distributed network. It involves making high-level decisions that determine how the different parts of a system will interact to achieve the desired functionality, performance, and reliability.

Within this discipline, a crucial distinction exists between High-Level Design (HLD) and Low-Level Design (LLD).

High-Level Design (HLD) is the macro view, the 30,000-foot perspective of the system. It's the overall architectural plan, identifying the major components and how they fit together. In our festival analogy, the HLD is the map of the entire festival ground, showing the main stage, food court area, medical tents, and the pathways connecting them. It doesn't detail the wiring inside the speakers, but it shows where the sound systems are and how they connect to the central power source.
Low-Level Design (LLD) is the micro view, zooming in on the internal workings of each individual component. It involves detailing class diagrams, database schemas, and the specific logic of API endpoints. For our festival, the LLD would be the architectural blueprint for the main stage, the electrical wiring diagram for the food stalls, or the specific staff schedule for the security team.

This course will equip you to think like an architect—to move from building individual houses (coding) to planning entire, thriving cities (system design).

Day 1 vs. Year 1 Architecture

To illustrate this, let's visualize the difference between a "Day 1" proof-of-concept and a "Year 1" web-scale system.

Day 1: Proof of Concept

This is the simplest possible setup, often running on a single machine. It's great for validating an idea but cannot handle significant traffic.

Year 1: Web-Scale System

As the user base grows, the architecture must evolve to handle the load. This involves distributing components across multiple servers.

1.2. Why This Party Matters for Your Career

Mastering system design is more than just learning a new technical skill; it is the primary catalyst for advancing into senior engineering roles and beyond. Companies like Google, Meta, Amazon, and Netflix use system design interviews as the principal method for evaluating candidates for senior and staff-level positions. Why? Because these interviews assess a fundamentally different and more critical skill set than standard coding challenges.

A coding interview asks, "Can you solve this well-defined problem?" A system design interview asks, "Can you solve this undefined problem, and can you explain your thought process while doing so?" The questions are intentionally open-ended and ambiguous—"Design a service like Twitter," "Design a URL shortener"—because they are designed to simulate the real-world challenges that senior engineers face daily.

The interviewer is not looking for a single "correct" answer, because one rarely exists. Instead, they are evaluating a set of core competencies:

Systemic Thinking: Can you see the big picture and understand how different components interact?
Handling Ambiguity: Can you take a vague prompt, ask intelligent clarifying questions, and define a clear scope for the problem?
Trade-off Analysis: Do you understand that every design choice has pros and cons? Can you articulate why you chose one database over another, or one scaling strategy over its alternative?
Communication and Collaboration: Can you lead a technical discussion, articulate your ideas clearly on a whiteboard, and incorporate feedback from your interviewer? The interview is a dialogue, not a monologue.

The transition from being evaluated on coding to being evaluated on system design is a direct signal of a company's changing expectations. Junior engineers are typically assigned well-scoped tasks within an existing architecture. Senior engineers, however, are expected to create that architecture. They are given broad, ambiguous business problems and are trusted to make high-impact decisions that will affect multiple teams and define the technical direction of a product for years to come. Successfully navigating a system design interview demonstrates that you possess this "seniority signal"—the ability to think strategically, handle complexity, and lead the technical charge. It proves you are ready not just to build, but to architect.

Part 2: The Core Concepts - The VIPs of Our Party

Before you can design a system, you must understand the fundamental principles that govern all large-scale applications. These are the non-negotiable qualities, the VIPs of our party, that every great architect respects. These are often referred to as non-functional requirements (NFRs), and they are the true drivers behind every significant design decision.

2.1. The Three Pillars: The Foundation of Any Great System

At the heart of modern system design lie three pillars: Scalability, Availability, and Reliability. Understanding them is non-negotiable.

Scalability: Handling a Growing Guest List

Scalability is a system's ability to gracefully handle a growing amount of load—be it more users, more data, or more transactions—without a drop in performance. If your party is a hit, you need a plan for when the crowds show up. There are two primary ways to scale :

Vertical Scaling (Scaling Up): This approach involves making your existing server more powerful. Think of it as replacing your small party speaker with a massive concert-grade sound system. Technically, this means adding more CPU cores, more RAM, or faster storage (like SSDs) to a single machine.
- Pros: It's conceptually simple. Managing one big machine is often easier than managing many small ones.
- Cons: It gets very expensive at the high end. There is a hard physical limit to how much you can upgrade a single server. Furthermore, it creates a single point of failure (SPOF)—if that one massive server goes down, your entire system is offline.
Horizontal Scaling (Scaling Out): This approach involves adding more machines to your pool of resources. Instead of one giant speaker, you set up dozens of smaller speakers all around the venue, working in concert. Technically, this means adding more servers (nodes) to your system, distributing the load among them using a load balancer.
- Pros: It can scale to handle almost limitless traffic. It is more resilient to failure; if one server goes down, the others can pick up the slack. This is the preferred approach for modern, web-scale applications.
- Cons: It introduces architectural complexity. You now need to manage traffic distribution, data synchronization between machines, and inter-service communication.

Availability: The Party Never Stops

Availability is the measure of a system's uptime. It's the percentage of time that the system is operational and accessible to users when they need it. In the world of system design, availability is often expressed in "nines":

Availability %	Downtime per Year
99% ("two nines")	3.65 days
99.9% ("three nines")	8.77 hours
99.99% ("four nines")	52.6 minutes
99.999% ("five nines")	5.26 minutes
99.9999% ("six nines")	31.5 seconds

High availability is achieved by designing for resilience. This means embracing concepts like fault tolerance—the ability of the system to continue operating even if some of its components fail. This is accomplished through redundancy, which is the duplication of critical components. For example, instead of one database, you have a primary and a backup (replica). If the primary fails, the system can "failover" to the replica, minimizing downtime. The ultimate goal is to eliminate any

Single Point of Failure (SPOF)—a component whose failure would bring down the entire system. If the only DJ at your party decides to leave, the music stops for everyone. A highly available system has multiple DJs, or at least a pre-approved playlist ready to go.

Reliability: A Flawless Experience

Reliability is the probability that a system will perform its intended function correctly and without failure over a specified period of time. While it sounds similar to availability, the distinction is critical for building user trust.

A system can be available but not reliable. Imagine an e-commerce website during a major holiday sale. The website is available—you can load the homepage, and it responds to your clicks. However, when you try to add an item to your cart, it shows the wrong price, or when you click "checkout," the payment fails. The system is up and running, but it is not performing its function correctly. It is available, but it is utterly unreliable.

This difference is fundamental to design choices. A user might forgive a website for being down for five minutes (a temporary lapse in availability). They are far less likely to forgive a system that loses their data or incorrectly processes their financial transaction (a failure of reliability). Therefore, while both are important, the context of the application dictates which to prioritize. A banking application must be designed for maximum reliability, even if it means scheduling maintenance windows that reduce availability. A social media feed, on the other hand, might tolerate a minor, temporary glitch in displaying a "like" count (low reliability for one small feature) in order to maintain overall system availability.

The Availability vs. Reliability Matrix

Low Reliability

High Reliability

High Availability

The Frustrating Glitch Example: A buggy online game that's always online but has frequent errors.

The Ideal Example: Google Search. It's always on and gives correct results.

Low Availability

The Failed System Example: A startup's beta product that is often offline and full of bugs.

The Dependable Workhorse Example: A legacy banking system that has nightly maintenance but is 100% accurate when up.

2.2. The Great Trade-off: The CAP Theorem

In the world of distributed systems—systems that operate across multiple machines—you can't have everything. The CAP Theorem, also known as Brewer's Theorem, formalizes this reality. It states that a distributed data store can only simultaneously provide two of the following three guarantees :

Consistency: Every read request receives the most recent write or an error. In simpler terms, all nodes in the system see the same data at the same time.
Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write. The system is always ready to respond.
Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes. In modern distributed systems, network partitions are a fact of life, so Partition Tolerance is generally considered a mandatory requirement.

Since we must tolerate network partitions (P), the real trade-off in system design is between Consistency and Availability. Let's use an analogy. Imagine three friends—Alice, Bob, and Carol—are co-editing a critical document on a shared cloud drive.

Consistency: Alice, Bob, and Carol always see the exact same version of the document.
Availability: Alice, Bob, and Carol can always open and make edits to the document.
Partition Tolerance: The internet connection between Alice's house and the houses of Bob and Carol goes down. This is a network partition.

Now, the system designer must make a choice:

Choose Consistency over Availability (CP System): When the partition occurs, the system locks the document for Alice. She can't make any edits (or even read, in some strict models) until her connection is restored. This prevents her from creating a version of the document that conflicts with what Bob and Carol are seeing. The system prioritizes data integrity but becomes unavailable for Alice. Traditional relational databases like MySQL configured in a specific way often fall into this category.
Choose Availability over Consistency (AP System): When the partition occurs, the system allows Alice to continue editing her local copy of the document. The system remains available for her. However, her version is now out of sync with the version Bob and Carol are working on. When her internet connection is restored, the system will need a way to resolve these conflicting edits. This state of temporary inconsistency is known as Eventual Consistency—the system guarantees that if no new updates are made, all replicas will eventually converge to the same state. Many NoSQL databases like Cassandra and Amazon's DynamoDB are designed as AP systems, prioritizing availability for a better user experience, especially at massive scale.

2.3. Performance Twins: Latency & Throughput

Performance is a cornerstone of user experience, and it's primarily measured by two metrics: latency and throughput.

Latency is the time it takes to perform a single action. It's the delay between a cause and its effect. When you click a link, latency is the time you wait until the page starts to load. It's often measured in milliseconds (ms). In our party analogy, latency is the time it takes for a single guest to walk up to the bar, place an order, and receive their drink.
Throughput is the number of actions that can be performed in a given unit of time. It's a measure of rate or capacity. For a web server, it might be measured in requests per second (RPS) or transactions per second (TPS). In our party analogy, throughput is the total number of drinks the bar can serve per hour.

These two concepts are often in opposition. Optimizing for one can negatively impact the other. Consider a system designed to process uploaded videos.

To maximize throughput, the system might be designed to process videos in large batches. It waits until it has 100 videos, then processes them all at once very efficiently. This increases the total number of videos processed per hour (high throughput), but any individual video has to wait for the batch to fill up, increasing its personal wait time (high latency).
To minimize latency for a single, high-priority video, the system might interrupt its batch processing to handle that one video immediately. The VIP user who uploaded it is thrilled with the fast turnaround. However, this context switching and disruption to the efficient batch workflow reduces the total number of videos the system can process per hour (low throughput).

The correct design choice depends entirely on the business requirements. A service for live-streaming a news event must prioritize minimal latency. A service for archiving personal home videos can prioritize high throughput to keep costs down. As a designer, you must always ask: "For this feature, what does the user value more: speed for one, or efficiency for all?"

2.4. A Quick Tour of the Party Venue: Fundamental Components

Just as a party venue has a doorman, a coat check, and a kitchen, large-scale systems are built from a set of common, reusable components. Here is a high-level tour of the essential building blocks you'll use in your designs.

Client-Server Architecture: This is the fundamental structure of the web. A client (like your web browser or mobile app) requests a resource or service, and a server (a powerful computer running specialized software) provides it. This request-response cycle is the basis for most interactions on the internet.

The Doorman (Load Balancer): When you have more traffic than a single server can handle (horizontal scaling), you need a load balancer. It acts as a "traffic cop," sitting in front of your servers and distributing incoming client requests across multiple backend servers. This prevents any single server from becoming a bottleneck and improves the system's availability and reliability. If one server fails, the load balancer can detect this and stop sending traffic to it, redirecting requests to the healthy servers.
The Coat Check (Caching): A cache is a smaller, faster memory layer that stores copies of frequently accessed data. Accessing data from a database on disk is slow. Accessing it from a cache in RAM (Random Access Memory) is orders of magnitude faster. By placing a cache (like Redis or Memcached) in front of a database, a system can serve many requests without ever hitting the slower database, dramatically reducing latency and easing the load on the backend. ⚡
The Library (Databases): This is where the system's persistent data is stored. The two main categories of databases are:
- SQL (Relational) Databases: These databases, like MySQL or PostgreSQL, store data in tables with rows and columns and enforce a predefined structure (a schema). They are known for their reliability and data integrity, guaranteed by what are known as ACID (Atomicity, Consistency, Isolation, Durability) properties. They are an excellent choice for applications that require strong consistency, like financial or e-commerce systems.
- NoSQL (Non-relational) Databases: This broad category includes databases like MongoDB, Cassandra, and DynamoDB. They are more flexible and typically do not require a fixed schema. They are designed to scale horizontally to handle massive amounts of data and traffic, often prioritizing availability and speed over the strict consistency offered by SQL databases. 📚
The Post Office (Message Queues): In a complex system, different services often need to communicate with each other. A message queue (like RabbitMQ or Apache Kafka) is an intermediary component that allows services to communicate asynchronously. One service (the producer) writes a message to the queue, and another service (the consumer) reads it when it's ready. This decouples the services; the producer doesn't have to wait for the consumer to be available. If the consumer service is down, the messages simply wait safely in the queue until it comes back online, improving the system's overall reliability and fault tolerance. 📮
The Content Couriers (Content Delivery Network - CDN): A CDN is a geographically distributed network of proxy servers that cache static content (like images, videos, CSS, and JavaScript files) close to end-users. When a user in Japan requests an image from a website whose main servers are in Virginia, the request doesn't have to travel across the Pacific. Instead, it's served from a CDN edge server located in or near Tokyo. This drastically reduces latency and offloads a significant amount of traffic from the origin servers.
The Neighborhoods (Microservices): Instead of building one giant, monolithic application, the microservices architecture breaks the application down into a collection of smaller, independent services. Each service is responsible for a specific business function (e.g., a "user service," a "payment service," a "notification service") and communicates with other services through well-defined APIs. This approach allows different teams to develop, deploy, and scale their respective services independently, promoting modularity and agility.

Part 3: The Interview Framework - Your 4-Step Dance Routine

3.1 Introduction: A Structured Approach to Impress

System design interviews are intentionally open-ended to test how you navigate ambiguity. Without a plan, it's easy to get lost, ramble, or dive into unnecessary details. The single most powerful tool in your arsenal is a structured framework. Following a clear, repeatable process demonstrates maturity, methodical thinking, and respect for the problem-solving process. It transforms a potentially chaotic session into a collaborative design discussion, which is precisely what the interviewer wants to see. ✍️

This 4-step framework is a synthesis of best practices recommended by engineers and managers at top tech companies. Think of it as your dance routine for the interview—practice it until it becomes second nature.

3.2. Step 1: Understand & Explore (Clarify Requirements)

Time Allotment: 5-10 minutes

Goal: To collaboratively define the problem's scope and constraints with the interviewer. Never, ever start designing without this step.

Actions:

Ask Clarifying Questions: Treat the interviewer as your product manager or client. Your goal is to narrow the vast, ambiguous problem into a concrete set of deliverables.
- "What are the most critical features for the MVP (Minimum Viable Product)?"
- "Should this system support mobile clients, web clients, or both?"
- "Are we designing for a specific geographical region, or is this a global service?"
- "What is the expected scale of the system? How many users should we plan for?"
Define Functional Requirements: These describe what the system must do. List the core user-facing features. For a system like Instagram, this would be:
- Users can upload photos and videos.
- Users can view a feed of content from people they follow.
- Users can search for other users or content.
- Users can comment on and like posts. It is crucial to prioritize and agree on the top 3-5 core features to focus on for the interview, as you cannot design everything in 45 minutes.
Define Non-Functional Requirements (NFRs): These describe the qualities of the system—how it should be. This is often the most critical part of the conversation.
- Availability: Does this system need to be highly available? Is 99.9% okay, or do we need 99.999%?
- Consistency: Is strong consistency required (e.g., for a bank transfer), or is eventual consistency acceptable (e.g., for a social media "like" count)? This directly invokes the CAP theorem.
- Latency: How fast does the system need to feel to the user? Should the feed load in under 200ms? Is a 1-second delay for posting a comment acceptable?
- Durability: How critical is it that we never lose data? A banking system cannot tolerate data loss, while a cache can.
- Scalability: What is the expected read-to-write ratio? Is the traffic bursty or consistent?

Functional requirements determine the features you build, but non-functional requirements dictate the architecture you choose. Any developer can design a system for one user to upload a photo. The real challenge, and the essence of the interview, is designing a system that does so for 100 million daily active users (scalability) with near-instant load times (low latency) and five-nines uptime (high availability). The NFRs force the discussion of load balancers, CDNs, database sharding, and caching. Proactively asking about and defining these NFRs is a hallmark of a senior engineer's mindset.

3.3. Step 2: High-Level Design (The Whiteboard Sketch)

Time Allotment: 10-15 minutes

Goal: To create a broad, high-level architectural diagram that outlines the major components and their interactions.

Actions:

Draw the Core Components: On the whiteboard (physical or virtual), sketch out the main building blocks of your system. Start simple. For a typical web service, this might include boxes for the Client, a Load Balancer, a set of Application Servers, and a Database.
Illustrate Data Flow: Use arrows to show how requests and data flow through the system. A user request might flow from the client to the load balancer, then to an app server, which then queries the database and returns a response. This visual narrative helps clarify the system's operation.
Define System APIs: Define the specific API endpoints that the client will use to interact with the system. This establishes a clear contract and ensures you and the interviewer are on the same page about the system's functionality. For a Twitter-like service, this could be:
- POST /api/v1/tweets (Body: { "content": "Hello world!" })
- GET /api/v1/timeline
- POST /api/v1/users/{id}/follow
Seek Sign-off: After sketching your initial HLD, pause and walk the interviewer through it. Ask, "Does this high-level approach look reasonable to you as a starting point?". This is a crucial collaborative step. It keeps the interviewer engaged, shows you value their input, and gives them a chance to steer you if you're heading in a direction they're less interested in exploring.

3.4. Step 3: Deep Dive (Zooming In)

Time Allotment: 10-20 minutes

Goal: To demonstrate your in-depth knowledge by focusing on one or two key components of your design.

Actions:

Identify Bottlenecks and Areas of Interest: Look at your high-level diagram and proactively identify potential weak points or interesting challenges. For example:
- "Given the high read traffic we estimated, the database is likely to become a bottleneck."
- "Generating the user's timeline feed in real-time could be computationally expensive and slow."
- "Storing petabytes of photo data will require a specialized storage solution."
Propose a Focus Area: Suggest a component to dive into, or ask the interviewer for their preference. "I think the most challenging part of this design is the timeline generation. Would you like me to start there, or is there another component you're more interested in?" This shows you can prioritize and gives the interviewer agency.
Discuss Solutions and Trade-offs: This is the core of the interview. For your chosen component, discuss different implementation strategies and, most importantly, their trade-offs.
- Example Dialogue (Database Bottleneck): "To address the read bottleneck on our database, we have a few options.
  - Option 1: Caching. We could introduce a distributed cache like Redis. The major advantage is a massive reduction in latency for frequently accessed data. The trade-off is that we now have to manage cache consistency. We'd need to decide on an invalidation strategy, like write-through, which offers strong consistency but adds latency to writes, or setting a Time-to-Live (TTL), which is simpler but means the data can be stale for a short period.
  - Option 2: Read Replicas. We could set up several read-only copies of our database. The load balancer would direct all write traffic to the primary database and distribute read traffic across the replicas. This is great for scaling read-heavy workloads. The trade-off here is the replication lag; there will be a small delay before data written to the primary is available on the replicas, leading to eventual consistency.
  - Given our requirement for a fast-loading feed, I would lean towards a combination of both: a caching layer to handle the hottest data and read replicas to scale reads for the rest."

3.5. Step 4: Scale & Refine (Wrapping Up)

Time Allotment: 5-10 minutes

Goal: To address broader operational concerns and show you can think about the long-term health and evolution of the system.

Actions:

Discuss Further Scaling: Address how the design would handle a 10x or 100x increase in load. This is where you introduce more advanced concepts.
- Database Sharding: "To scale our write traffic and store petabytes of data, we would need to partition our database, a technique called sharding. We could use a range-based sharding strategy or a hash-based strategy, each with its own pros and cons regarding data distribution and hotspots."
- Geo-Distribution: "To serve a global audience with low latency, we would deploy our services across multiple data centers in different regions and use a CDN to cache content closer to users."
Address Fault Tolerance and Monitoring: How do you ensure the system stays healthy and recovers from failures?
- Redundancy: "Each component, from load balancers to databases, would have redundant instances running in active-active or active-passive configurations to prevent single points of failure."
- Monitoring and Alerting: "We would need robust monitoring on key system metrics like latency, error rates, and CPU utilization. We'd set up automated alerts to notify the on-call engineer of any anomalies."
Summarize and Conclude: End the interview with a concise summary of your design. Briefly recap the problem, your proposed architecture, and the key trade-offs you made. This provides a clean closing to the conversation and leaves a strong final impression. You can also mention potential future improvements if you had more time, showing you're always thinking ahead.

Part 4: Back-of-the-Envelope Estimation - The Party Planner's Math

4.1. Introduction: From Guesswork to Educated Estimates

In a system design interview, you will be asked to design systems at a scale you've likely never worked with directly. How do you make informed decisions when dealing with millions of users and petabytes of data? The answer is Back-of-the-Envelope (BotE) estimation.

BotE is the art of using simplified assumptions and rough calculations to quickly approximate a system's resource requirements. The goal is not to achieve perfect precision; it's to get the order of magnitude right. These quick calculations are invaluable for identifying potential bottlenecks, justifying your design choices, and demonstrating to the interviewer that you can think quantitatively about scale. 🧮

The key principles of BotE are:

Round and Approximate: Precision is not the goal. During an interview, you won't have a calculator. Round numbers to make mental math easy. For example, there are 86,400 seconds in a day; for a BotE calculation, rounding this to 100,000 is perfectly acceptable and makes division much simpler.
State Your Assumptions: You will have to make assumptions about user numbers, usage patterns, and data sizes. Clearly state these assumptions and, if possible, write them on the whiteboard. This allows the interviewer to follow your logic and correct you if your assumptions are wildly off base.
Label Your Units: When you write down a number, always include the unit (e.g., KB, MB, GB, QPS). This prevents ambiguity and costly errors in your calculations.

4.2. A Worked Example: Estimating a Photo-Sharing App

Let's apply these principles to a common interview problem: "Design a photo-sharing service like Instagram." We'll focus on the estimation part of the problem.

Step 1: Clarify Requirements & Make Assumptions

First, we state our assumptions, linking back to Step 1 of our interview framework.

Users: 500 million Daily Active Users (DAU).
Usage Pattern (Writes): 20% of users upload 1 new photo per day on average.
Usage Pattern (Reads): Each user views an average of 100 photos per day.
Data Size: The average photo size is 2 MB after compression.
Data Retention: Photos are stored forever. We'll calculate storage for 5 years to start.

Step 2: Estimate Traffic (Queries Per Second - QPS)

We'll use our rounded value of 100,000 seconds per day.

Write QPS (Uploads):
- Total uploads per day = 500 million DAU * 20% = 100 million uploads/day.
- Write QPS = 100,000,000 uploads / 100,000 seconds/day = 1,000 QPS.
Read QPS (Views):
- Total views per day = 500 million DAU * 100 views/user = 50 billion views/day.
- Read QPS = 50,000,000,000 views / 100,000 seconds/day = 500,000 QPS.
Peak QPS: Traffic is rarely uniform. A common rule of thumb is to assume peak traffic is 2x the average.
- Peak Read QPS = 500,000 * 2 = 1,000,000 QPS.

Step 3: Estimate Storage

Daily Storage Ingress:
- 100 million new photos/day * 2 MB/photo = 200 million MB/day = 200 TB per day.
Total Storage for 5 Years:
- We'll use an approximation of 365 days ≈ 400 for easier math, or simply use 2000 for 5 years.
- Total Storage = 200 TB/day 365 days/year 5 years ≈ 200 * 1825 ≈ 365,000 TB = 365 Petabytes (PB).

Step 4: Estimate Bandwidth

Bandwidth is the rate of data transfer.

Ingress Bandwidth (Uploads): This is the data coming into our system.
- 200 TB of new data per day / 100,000 seconds/day = 2 TB/s.
- Since 1 Byte = 8 bits, 2 TB/s = 16 Terabits per second (Tbps). This is an enormous amount of incoming data.
Egress Bandwidth (Downloads): This is the data leaving our system to be viewed by users.
- 500,000 read QPS * 2 MB/photo = 1,000,000 MB/s = 1,000 GB/s = 1 TB/s.
- In bits, this is 8 Tbps.

These numbers are not just abstract figures; they are direct instructions for our system design. The calculation reveals a read-to-write ratio of 500,000:1,000, or 500:1. This system is overwhelmingly read-heavy. This fact, combined with the massive egress bandwidth requirement of 8 Tbps, makes it clear that serving all this content from a centralized cluster of servers is completely infeasible. This calculation single-handedly proves that an aggressive, multi-layered caching strategy and a global Content Delivery Network (CDN) are not optional optimizations—they are mandatory, core components of the architecture. The 365 PB storage requirement tells us we cannot use a traditional database for the photos themselves; we must use a distributed object store like Amazon S3 or HDFS. This is the power of BotE: it transforms abstract requirements into concrete engineering constraints, guiding your design from the very beginning.

4.3. Essential Tables for Estimation

To perform these calculations quickly and make informed trade-offs, it helps to have a few key sets of numbers memorized.

Table 1: Powers of Two (Data Scale)

This table provides a quick reference for the scale of digital information.

Unit	Size	Analogy (Rough Number of Text Pages)
1 Kilobyte (KB)	210 bytes	Half a page of text
1 Megabyte (MB)	220 bytes	A 500-page book
1 Gigabyte (GB)	230 bytes	A small library shelf
1 Terabyte (TB)	240 bytes	The entire collection of a large library
1 Petabyte (PB)	250 bytes	All photos on Facebook (as of a few years ago)

Table 2: Latency Numbers Every Engineer Should Know

These numbers represent the fundamental "laws of physics" for system performance. Understanding their relative orders of magnitude is more important than memorizing their exact values. The difference in speed between accessing the CPU's cache and accessing a spinning disk is not just a small optimization—it's a factor of millions. This vast difference is the entire justification for the existence of caching. It is the reason we use in-memory databases, and why database engineers spend their careers designing query planners that avoid slow disk seeks. Knowing these numbers allows you to justify your design choices from first principles.

Operation	Typical Latency	Real-World Analogy
L1 Cache Reference	~1 ns	Blinking your eye
L2 Cache Reference	~4-7 ns	Tossing a piece of paper into a bin next to you
Main Memory (RAM) Reference	~100 ns	Walking to the kitchen to get a snack
SSD Random Read (4KB)	~20-100 μs	Taking a flight from San Francisco to San Jose
Round Trip within same Datacenter	~500 μs	A helicopter trip across a major city
Reading 1 MB sequentially from SSD	~1 ms	A cross-country flight (e.g., NY to LA)
Disk Seek (HDD)	~10 ms	A flight from North America to Europe
Internet Round Trip (US to Europe)	~150 ms	A rocket trip to the Moon and back

(Note: 1 ms = 1,000 μs; 1 μs = 1,000 ns)

1: A social media service has 200 million DAU. On average, each user posts 2 comments per day. What is the average write QPS for comments?

Use the rounded value of 100,000 seconds in a day to perform this back-of-the-envelope calculation.

2: A video streaming service stores videos that are 100 MB on average. If 1 million new videos are uploaded per day, what is the daily storage ingress in Terabytes (TB)?

Calculate the total storage in megabytes first, then convert it to terabytes. (1,000,000 MB = 1 TB).

3: If a single database server can handle 5,000 QPS and your system's peak QPS is 40,000, how many servers are needed to handle the peak load?

This is a horizontal scaling problem. Calculate the number of servers required to meet the peak demand.

4: A system has 50 million DAU and a read-to-write ratio of 20:1. If each user performs 5 writes per day, what is the average read QPS?

First, calculate the total number of writes per day. Then, use the read-to-write ratio to find the total reads. Finally, convert to QPS.

5: Your application needs to cache 5 KB of data for each of its 10 million active users. What is the total size of the cache required in Gigabytes (GB)?

Calculate the total kilobytes needed, then convert up to gigabytes. (1,000,000 KB = 1 GB).

6: You are designing a system for a stock trading platform. According to the CAP theorem, would you design this as an AP (Availability-first) or CP (Consistency-first) system?

Consider the trade-offs. What is more damaging in a financial transaction: a slight delay, or showing an incorrect balance?

7: A user action needs to trigger a push notification, an email, and an in-app alert. Should you use synchronous calls or a message queue?

Think about reliability and decoupling. What happens in a synchronous system if the email service is down? How does a message queue change that outcome?

8: Your BotE calculation shows an extremely high egress (outgoing) bandwidth requirement for a global application. What single component is most effective at mitigating this issue?

Think about where your users are located and where your data is served from. How can you reduce the distance between them?

9: You are told to design a system that must have extremely low latency for reads of small, frequently accessed data (e.g., a user's profile). What is your primary tool?

Consult the 'Latency Numbers' table. Which data access method is orders of magnitude faster than hitting a database on an SSD or HDD?

10: During an interview, you identify the database as a bottleneck for a read-heavy workload. Name two distinct strategies you could propose to solve this.

How can you reduce the number of read requests hitting the main database? Think about both reducing requests and distributing them.

Ready to Practice?

Test your knowledge on what you've just learned.

Ready to test your understanding of the topics in this module? Head over to the Practice Hub for a focused quiz session.

Start Practice Quiz

Want to track your progress?

Ch2 System Design: Scaling Your System - It's Getting Crowded! 📈

Part 1: Introduction

Part 2: The Core Concepts

Part 3: The Interview Framework

Part 4: Back-of-the-Envelope Estimation

On this Page

Ch1 System Design Introduction: The Welcome Party - Your Journey into System Design Begins

Part 1: Welcome to the Architect's Mindset

1.1. Introduction: You're Invited to the Party!

1.2. Why This Party Matters for Your Career

Part 2: The Core Concepts - The VIPs of Our Party

2.1. The Three Pillars: The Foundation of Any Great System

2.2. The Great Trade-off: The CAP Theorem

2.3. Performance Twins: Latency & Throughput

2.4. A Quick Tour of the Party Venue: Fundamental Components

Part 3: The Interview Framework - Your 4-Step Dance Routine

3.1 Introduction: A Structured Approach to Impress

3.2. Step 1: Understand & Explore (Clarify Requirements)

3.3. Step 2: High-Level Design (The Whiteboard Sketch)

3.4. Step 3: Deep Dive (Zooming In)

3.5. Step 4: Scale & Refine (Wrapping Up)

Part 4: Back-of-the-Envelope Estimation - The Party Planner's Math

4.1. Introduction: From Guesswork to Educated Estimates

4.2. A Worked Example: Estimating a Photo-Sharing App

4.3. Essential Tables for Estimation