System Design

From 1 to 1,000,000 Users

Every big system started as one server doing everything. The path to a million users is just a handful of well-understood moves — here they are, one diagram at a time.

The PrepSync TeamJune 28, 202611 min read

System design sounds intimidating, but here's the secret interviewers won't tell you: it's mostly common sense applied under pressure. You don't memorise architectures — you learn to spot the bottleneck and reach for the right tool.

So forget the buzzwords for a second. Imagine you just launched an app. One user shows up. Then a hundred. Then a hundred thousand. We'll grow it one realistic step at a time, and at every step we'll ask the only question that matters:

What breaks next, and how do we fix it?

It starts with one server

In the beginning, life is simple. Your web app, your business logic, and your database all live on a single server. A user's browser sends a request, the server does some work, talks to the database, and sends back a response.

The humble beginning — a single server handling everything.

And honestly? This is fine. A single modest server can comfortably handle thousands of users. Don't let anyone shame you out of a simple setup — the biggest mistake junior engineers make is designing for a million users they don't have yet.

But let's say your app takes off. Traffic climbs. The server starts to sweat. The app and the database are now fighting over the same CPU and memory. Time for our first move.

Split the app from the database

The app server and the database have very different appetites. App servers want lots of CPU to handle requests; databases want lots of memory and fast disks. Cramming them together means neither gets what it needs.

So we split them onto separate machines.

Splitting concerns: the app tier and the data tier scale independently.

Now each can be sized and scaled on its own. This single split is the foundation of nearly every real-world architecture: a stateless app tier and a stateful data tier. Remember that phrase — it earns points in interviews.

Add a load balancer

One app server can still be overwhelmed. The fix is beautifully simple: run more copies of it. This is called horizontal scaling — instead of buying one giant server (vertical scaling), you add many ordinary ones.

But now users need to know which server to talk to. They shouldn't have to care. So we put a load balancer in front. It accepts every request and spreads the traffic evenly across your fleet.

A load balancer fans traffic out across identical, stateless app servers.

This is also why we want app servers to be stateless — no user session stored on the server itself. If server 2 dies mid-session, the load balancer just sends you to server 3 and you never notice. (Where does session state go? Usually a shared store like Redis — which we're about to add anyway.)

Key idea: stateless app servers + a load balancer means you can add or remove capacity at will. A server crashing becomes a shrug, not an outage.

Cache what you read often

Here's a truth about real apps: the same data gets read over and over. The same trending post, the same user profile, the same product page. Every one of those reads hammers your database with a question it already answered a second ago.

Enter the cache — a blazing-fast, in-memory store (like Redis) that sits between your app and your database. Before hitting the database, the app checks the cache first.

The cache-aside pattern: check the cache first, fall back to the database, then remember the answer.

A cache hit can be 100x faster than a database query, and it takes load off the database so it can focus on the writes that actually need it. The trade-off is the hardest problem in computer science wearing a trench coat: cache invalidation. Stale data hides in caches, so you set sensible expiry times (TTLs) and clear entries when the underlying data changes.

Push static files to a CDN

Your app sends the same images, JavaScript, and CSS to everyone. Why make a user in Lagos wait for bytes to travel from a server in Virginia for every page load?

A CDN (Content Delivery Network) is a global network of servers that cache your static files physically close to your users. The request is answered from a city near them, not from across an ocean.

A CDN serves static assets from an edge location near the user; only true misses reach your servers.

Two wins at once: pages feel instantly faster for users everywhere, and your own servers stop wasting effort shipping the same logo a million times.

Scale reads with replicas

Most apps read far more than they write. Think about it: thousands of people view a post for every one person who creates one. We can exploit that.

We promote our database to a primary that handles all writes, and add read replicas — copies that the primary continuously streams its changes to. The app sends writes to the primary and spreads reads across the replicas.

Writes go to the primary; reads scale out across replicas that the primary keeps in sync.

This buys enormous headroom because reads — your most common operation — now scale horizontally. The catch is replication lag: a replica might be a few milliseconds behind the primary, so a user could occasionally read data that's a heartbeat stale. For most features (a feed, a profile) that's perfectly fine.

Shard when the data gets huge

Eventually a single primary can't hold all your data or absorb all your writes. Now we shard: split the data across multiple databases, each owning a slice.

A common approach is to route by a key — say, the user's ID — so every user's data consistently lands on the same shard.

Sharding spreads both data and write load across independent databases.

Sharding is powerful but it's the point of no return in terms of complexity: queries that span shards get painful, and rebalancing data is real work. This is genuinely a "you'll know when you need it" tool — and saying that in an interview shows maturity.

Go async with a queue

Some tasks are slow: sending a welcome email, generating a video thumbnail, running a report. If the user has to wait for them inside the request, the app feels sluggish.

The fix is a message queue. The app drops a job onto the queue and instantly returns a response. Separate worker processes pick jobs off the queue and do the heavy lifting in the background.

The app responds immediately and hands slow work to background workers via a queue.

This decouples the request from the work. Traffic spikes? Jobs just queue up and the workers chew through them. Need more throughput? Add more workers. Your users, meanwhile, got their instant response.

The full picture

Stack every move together and the once-humble single box has become a resilient, scalable system. Here's the architecture we built, end to end:

The full journey: CDN, load balancer, stateless app tier, cache, queue + workers, and a replicated database.

Notice that we never made a single dramatic leap. We just kept asking "what breaks next?" and applied one focused fix at a time. That is system design.

The cheat sheet

When you spot a symptom, reach for the matching tool. Memorise this table and you'll have a structured answer for almost any "how would you scale this?" question:

The bottleneck	The symptom	The fix
App + DB on one box	Everything slows together	Split app and data tiers
One app server	Maxed CPU, dropped requests	Load balancer + more servers
Repeated reads	Database overloaded by reads	Add a cache (Redis)
Slow global assets	Pages load slowly far away	Put a CDN at the edge
Read-heavy database	Reads saturate the primary	Add read replicas
Too much data / writes	Primary can't keep up	Shard the database
Slow in-request work	Sluggish responses	Queue + background workers

Taking this into an interview

When you get a system design question, don't panic and don't name-drop. Do this:

Start simple. Sketch the single-server version out loud. It shows you won't over-engineer.
Estimate the load. Reads vs writes? How much data? This tells you which bottleneck appears first.
Evolve, narrating trade-offs. Add each piece only when you can say why — and name what it costs you (a cache means stale data; a replica means lag).
Know when to stop. "I wouldn't shard yet — we don't have the scale" is a senior answer, not a cop-out.

System design rewards calm, structured thinking far more than encyclopedic knowledge. You now have the structure. The rest is reps.

Reading is one thing; doing it under pressure is another. The best way to get comfortable is to practise out loud, repeatedly — ideally with feedback on where your reasoning wobbles.