From 1 to 1,000,000 Users
Every big system started as one server doing everything. The path to a million users is just a handful of well-understood moves — here they are, one diagram at a time.
System design sounds intimidating, but here's the secret interviewers won't tell you: it's mostly common sense applied under pressure. You don't memorise architectures — you learn to spot the bottleneck and reach for the right tool.
So forget the buzzwords for a second. Imagine you just launched an app. One user shows up. Then a hundred. Then a hundred thousand. We'll grow it one realistic step at a time, and at every step we'll ask the only question that matters:
What breaks next, and how do we fix it?
It starts with one server
In the beginning, life is simple. Your web app, your business logic, and your database all live on a single server. A user's browser sends a request, the server does some work, talks to the database, and sends back a response.
And honestly? This is fine. A single modest server can comfortably handle thousands of users. Don't let anyone shame you out of a simple setup — the biggest mistake junior engineers make is designing for a million users they don't have yet.
But let's say your app takes off. Traffic climbs. The server starts to sweat. The app and the database are now fighting over the same CPU and memory. Time for our first move.
Split the app from the database
The app server and the database have very different appetites. App servers want lots of CPU to handle requests; databases want lots of memory and fast disks. Cramming them together means neither gets what it needs.
So we split them onto separate machines.
Now each can be sized and scaled on its own. This single split is the foundation of nearly every real-world architecture: a stateless app tier and a stateful data tier. Remember that phrase — it earns points in interviews.
Add a load balancer
One app server can still be overwhelmed. The fix is beautifully simple: run more copies of it. This is called horizontal scaling — instead of buying one giant server (vertical scaling), you add many ordinary ones.
But now users need to know which server to talk to. They shouldn't have to care. So we put a load balancer in front. It accepts every request and spreads the traffic evenly across your fleet.
This is also why we want app servers to be stateless — no user session stored on the server itself. If server 2 dies mid-session, the load balancer just sends you to server 3 and you never notice. (Where does session state go? Usually a shared store like Redis — which we're about to add anyway.)
Key idea: stateless app servers + a load balancer means you can add or remove capacity at will. A server crashing becomes a shrug, not an outage.
Cache what you read often
Here's a truth about real apps: the same data gets read over and over. The same trending post, the same user profile, the same product page. Every one of those reads hammers your database with a question it already answered a second ago.
Enter the cache — a blazing-fast, in-memory store (like Redis) that sits between your app and your database. Before hitting the database, the app checks the cache first.
A cache hit can be 100x faster than a database query, and it takes load off the database so it can focus on the writes that actually need it. The trade-off is the hardest problem in computer science wearing a trench coat: cache invalidation. Stale data hides in caches, so you set sensible expiry times (TTLs) and clear entries when the underlying data changes.
Push static files to a CDN
Your app sends the same images, JavaScript, and CSS to everyone. Why make a user in Lagos wait for bytes to travel from a server in Virginia for every page load?
A CDN (Content Delivery Network) is a global network of servers that cache your static files physically close to your users. The request is answered from a city near them, not from across an ocean.
Two wins at once: pages feel instantly faster for users everywhere, and your own servers stop wasting effort shipping the same logo a million times.
Scale reads with replicas
Most apps read far more than they write. Think about it: thousands of people view a post for every one person who creates one. We can exploit that.
We promote our database to a primary that handles all writes, and add read replicas — copies that the primary continuously streams its changes to. The app sends writes to the primary and spreads reads across the replicas.
This buys enormous headroom because reads — your most common operation — now scale horizontally. The catch is replication lag: a replica might be a few milliseconds behind the primary, so a user could occasionally read data that's a heartbeat stale. For most features (a feed, a profile) that's perfectly fine.
Shard when the data gets huge
Eventually a single primary can't hold all your data or absorb all your writes. Now we shard: split the data across multiple databases, each owning a slice.
A common approach is to route by a key — say, the user's ID — so every user's data consistently lands on the same shard.
Sharding is powerful but it's the point of no return in terms of complexity: queries that span shards get painful, and rebalancing data is real work. This is genuinely a "you'll know when you need it" tool — and saying that in an interview shows maturity.
Go async with a queue
Some tasks are slow: sending a welcome email, generating a video thumbnail, running a report. If the user has to wait for them inside the request, the app feels sluggish.
The fix is a message queue. The app drops a job onto the queue and instantly returns a response. Separate worker processes pick jobs off the queue and do the heavy lifting in the background.
This decouples the request from the work. Traffic spikes? Jobs just queue up and the workers chew through them. Need more throughput? Add more workers. Your users, meanwhile, got their instant response.
The full picture
Stack every move together and the once-humble single box has become a resilient, scalable system. Here's the architecture we built, end to end:
Notice that we never made a single dramatic leap. We just kept asking "what breaks next?" and applied one focused fix at a time. That is system design.
The cheat sheet
When you spot a symptom, reach for the matching tool. Memorise this table and you'll have a structured answer for almost any "how would you scale this?" question:
| The bottleneck | The symptom | The fix |
|---|---|---|
| App + DB on one box | Everything slows together | Split app and data tiers |
| One app server | Maxed CPU, dropped requests | Load balancer + more servers |
| Repeated reads | Database overloaded by reads | Add a cache (Redis) |
| Slow global assets | Pages load slowly far away | Put a CDN at the edge |
| Read-heavy database | Reads saturate the primary | Add read replicas |
| Too much data / writes | Primary can't keep up | Shard the database |
| Slow in-request work | Sluggish responses | Queue + background workers |
Taking this into an interview
When you get a system design question, don't panic and don't name-drop. Do this:
- Start simple. Sketch the single-server version out loud. It shows you won't over-engineer.
- Estimate the load. Reads vs writes? How much data? This tells you which bottleneck appears first.
- Evolve, narrating trade-offs. Add each piece only when you can say why — and name what it costs you (a cache means stale data; a replica means lag).
- Know when to stop. "I wouldn't shard yet — we don't have the scale" is a senior answer, not a cop-out.
System design rewards calm, structured thinking far more than encyclopedic knowledge. You now have the structure. The rest is reps.
Reading is one thing; doing it under pressure is another. The best way to get comfortable is to practise out loud, repeatedly — ideally with feedback on where your reasoning wobbles.