Third part of my system design train of thought
After the previous 2 parts, we’ve got a reasonable idea of the tools we have at our disposal in system design and why it is important.
Before I’ll get to specific task designs, I’d like to wrap up the introduction with how to approach it.
This part is basically dedicated to chapter 3 & 4 of the book I’m using to guide myself through it. You can find these 2 parts here.
So let’s start with estimations. How do we estimate something completely abstract? What does it mean if I’m supposed to design something for 1 million users?
Well, we first need to have an idea of the amount of data we’re transferring. To do that, we often talk about bytes. But millions of users creating twitter posts will have enormous numbers. What does it tell us that we store 1073741824 or 1073741424 bytes?
I’m sure you didn’t read those numbers, but the first one is 2^30 and the second is the 10^30 - 400. It doesn’t make a difference in designing. What does is the power of 2s.
Power | Value | Full name | Short Name |
---|---|---|---|
10 | Thousand | Kilobyte | KB |
20 | Million | Megabyte | MB |
30 | Billion | Gigabyte | GB |
40 | Trillion | Terabyte | TB |
50 | Quadrillion | Petabyte | PB |
So, now we’ve established some form of estimating the amounts of data we’re storing/trasnfering. That’s great, we can now say that we store 5 petabytes of data! But what does it mean for the speed?
Well, luckily there are more shorthands we could use.
There’s a list of numbers every programmer should know. Now, I don’t agree with “numbers every programmer should know”, but that’s because I’m lazy. I definitely won’t remember the entire list. Here are the numbers:
Operation name | Time |
---|---|
L1 cache reference | 0.5 ns |
Branch mispredict | 5 ns |
L2 cache reference | 7 ns |
Mutex lock/unlock | 100 ns |
Main memory reference | 100 ns |
Compress 1K bytes with Zippy | 10 µs (10 000ns) |
Send 2kb over 1 Gbps network | 20 µs (20 000ns) |
Read 1 MB from memory | 250 µs (250 000ns) |
Round trip in datacenter | 500 µs (500 000 ns) |
Disk seek | 10 ms (10 000 000 ns) |
Read 1 MB from the network | 10 ms (10 000 000 ns) |
Read 1 MB from disk | 30 ms (30 000 000 ns) |
Send packet CA->NL->CA | 150 ms (150 000 000 ns) |
Now, let’s look at the list. Rather than remembering exact values, let’s compare the operations instead. Look for orders of magnitude:
Let’s get here a really quick example of the above. Let’s consider we built a tool that shows images from disk. For simplicity, let’s do 30 images and consider that the images are 256KB in size
We can define availability as the ability of system to be operational for a set period of time. It’s often measured in percentages, where 100% availability means the service is never down, and 0% being that the service is never up. Usually, availability is between 99% to 100%.
Availability is important for users of our product, so most often, availability is defined on Service Level Agreement (SLA). Cloud providers often define the availability in 99.9% or above. To see in numbers how much we can afford downtime (DT) per a period of time, see table:
Availability | DT/day | DT/week | DT/month | DT/year |
---|---|---|---|---|
99% | 14.4 min | 1.68 h | 87.31 h | 3.65 days |
99.99% | 8.64 s | 1.01 min | 4.38 min | 52.6 m |
99.999% | 864 ms | 6.05 s | 26.3 s | 5.26 m |
99.9999% | 96.40 ms | 604 ms | 2.63 s | 31.56 s |
Let’s put together the numbers used here to quickly get an idea of how to estimate something.
Consider that we want to decide how many queries per second are made on twitter and how much storage is required for that. So what do we need?
Let’s say we want to estimate the storage, for that we need:
Alrighty, so how do we get that info? Well, we need to know how many users are actually using twitter and how many posts we receive. Without getting specific data, let’s assume the following:
So, let’s calculate number of QPS:
To calculate the storage, we need to know the size of the tweet. Let’s assume it’s build of id, text and media (but there are likely more):
So, to calculate the storage, let’s calculate daily storage:
Notice the last example. The total estimate would be unchanged for the total if we omitted the regular tweets.
So, now we have tools at our disposal. We know:
So, how would the entire process go in an interview? Well, let’s look at it:
So, we have the individual parts. Let’s cover them now:
So, let’s say we have a problem. Let’s say we have to design a news feed system. What information do we need?
Now, let’s let’s say the answers are:
So, we got an idea. 10 millions DAU, posts can have media and text, and is mobile and web app. But hold on! We thought its news feed, but we are also talking friends! So, let’s clarify some questions:
Let’s assume the answers are:
So, let’s start again with the basic design:
Now, we know the following:
So, we can separate this into 2 problems:
So, we have a lot of daily users. What does it mean?
So, we got our base - Web & Mobile app are connected to backend through loadbalancer.
Now, what is happening on the backend? Well, we’ve already mentioned it:
So, let’s put it in a drawing:
Cool! So we’ve design one part of it. We now have posts and are able to save them. But what about the news publishing itself? Sure, we could just load them all from the database of posts. But that might be a little problematic as it’d be overused. So, let’s create a separate service for that.
So, we now need only to read the data. I’m going to assume that the posts don’t change often - I haven’t changed my post on twitter for some time after all. Because we actually don’t need a DB here. We can read the data from the posts, sure, but this data is a good candidate for just using cache. It’s faster, remember?
It can look something like this:
And that’s it! For high level design, this is fine:
So, now we should be having some feedback from our colleagues when designing. We’ve agreed on goals and scope. We have a high level design. What’s next?
Now we will look at the individual parts. Let’s consider the first design for feed publishing:
What can we do more here? Well, we have a bunch of millions of users and our tool is built for them. We have some expectations for the amount of daily users. What we don’t know is the number of posts they will do per day. And we can also expect that they won’t do a post every millisecond.
But they can! And that’s the problem. So, let’s add a rate limiter. We will design one in the next chapter, but for now, rate limiter is just that - limits the number of times users can use our tool.
Another thing is we don’t have authentication in there! So let’s put it in. Only authenticated users can crewate posts!
We can’t do much more about the post service itself at this point. Of course there can be ideas
But, at this point, I’m happy with just cache and database. It should be reasonably fast.
But the fanout service can get better. Right now we just get data from cache. But:
So, let’s get friends data! The Fanout service will now retrieve data of our friends. And since there can be many, let’s use a graph DB for that! Graph is a really good tool for this. After all, a social network is a graph.
And finally, we can have a bunch of different workers to support it, because we need to get a single post into quite a lot of places (friends feeds). So, let’s create some workers for it and have them consume a message queue. As discussed in part 2, we can reasonably scale that part.
Great! We’ve done some progress on feed publishing. But what about retrieving the news feed? Well, we know that we have a bunch of databases now.
Furthermore, the news feed service can get quite a bit of static assets. Because a single post is visible to many people, we can keep the assets shared somewhere else. What’s good for that? A CDN!
Previously, we’ve just retrieved data from cache. But that data can get stale. So, let’s get the data from the actual databases as well!
So, let’s add this all together:
Great! Let’s see our final design:
Let’s wrap it all up now!
Now, is there something to be improved? Well, definitely!
In all of these parts, we could spend a long time. But that’s not the purpose of initial system design. We can get into details when we actually start imolementing it.
So, at this point, we’ve established what a good process for designing a system is. The important takeaways in my opinion are:
And finally - have fun with it. Designing a system with someone is a thought process. Bounce off one anothers’ ideas.