Creating a cloud storage
Welcome to the final part of System Design! In this last part, we’ll go and design a cloud storage!
Now, we’ve learned a lot through this system design journey. So, let’s see how we can apply it to creating google drive!
There’s quite a few cloud storages - Google Drive, Dropbox, OneDrive, iCloud. Almost every major IT player has one in store.
So, let’s take a quick look at how it works.
A “cloud drive” is basically a file system where one can store items. If you’d open your computer storage, you can see a tree-like structure:
Now, that’s one part of it. There’s also another part to it - synchronization. Basically, when I save files to my iCloud, I can access them from both mobile and web (and sometimes computer directly). So, let’s define the scope!
Another thing to consider is high reliability. It’s simply not acceptable for files to be lost if they are saved there.
Finally, let’s do some Back of the Envelope estimations:
So, we have the requirements. The book at this point goes into a description of creating a single server and scaling it up.
I very much like the idea, but again, as was the case with other chapters, I’m gonna do it in my own words to better understand it.
So, let’s start from scratch. Consider that the above is where we want to get to. And let’s investigate the reasons WHY we want to get there.
Now, consider that at this point you are a startup. You have very little users. Your app is used by a couple hundreds tops. Furthermore, each user has less resources. Let’s start with the storage being 1 TB. So, let’s setup a quick API:
Let’s consider our app is called simproch-storage.com
. So, that’s where we serve the web. Now, there are 3 endpoints:
api.simproch-storage.com/upload-file
POST
request with attached filemultipart/form-data
- see MDN file uploadapi.simproch-storage.com/download-file
file-path
paramapi.simproch-storage.com/get-revision
Now that we have our API, we’ll quickly setup a server, for example
Now, when we store a file, we generate a revision for this file, perhaps an UUID, or checksum of contents. The flow is like:
We also define it on HTTPS so the communication is secure.
Finally, we have all of this on a web server. But soon, we find out something disturbing! We’re running out of space because users started uploading too much, or because we’ve gotten a spike in traffic since our business is more popular!
So, what do we do? Well, we continue by creating more webservers. Now, we have multiple webservers, but we still don’t have any data in there! So, we come up with a brilliant approach. We will shard the database.
Great! We’ve accomplished something. But our business go bigger very fast, and now we have perhaps 4 TB of data for users available.
Even though we’ve put out the fire, we don’t want to run into this stressful situation again. So, we start thinking:
Slowly, the What If
scenarios pile up, and we decide to investigate. We soon find out about multiple cloud technologies allowing for storage.
One such technology is Amazon S3 Object Storage. We see that we can store any files in here. Furthermore, AWS takes the regional problems from us - we just need to configure it
So, we start using this. Instead of fetching and saving the files to our own DB, we’ll save it to a different provider.
Now, since we’ve scaled this part, it makes sense to try scaling other parts as well:
Now, we’ve gotten to a fairly solid high level design!
Now that’s done, we finally have a solid base. So, where do we go now?
Well, we may need to sync conflicts. Consider that there are multiple users that have access to same document. What if both of them update the file at the same time? This is where the revision comes in:
We’ll basically do the same as with git
conflicts.
What would finally happen be:
bunny.docx
being the main filebunny__revision__123123__2023-25-11.docx
showing the user that we didn’t manage to save a single file,
but they can work with it and resolve conflicts themselvesNow, if we look at the system design back up, we’ll notice one thing we’ve encountered before. When uploading a file, we first upload it to our server, and then to S3 storage. Again, let’s use presigned URL
That way, we’ll separate the design a little:
It will look something like this:
Now, we can see that the biggest change is:
Now, I mentioned presigned URL. And we can use that. However, we could also use new technologies.
In YouTube section, I’ve mentioned Group of Pictures, essentially splitting a video into many chunks. Block Level Storage is essentially the same thing, except we separate it into various blocks. This allows for parallelization and higher performance.
The blocks by themselves make no sense, just as GOPs. Only when a file storage above them merges the blocks together are they useful.
And that’s pretty much it! We’re missing two things:
Great! We’ve quickly moved from a very startupy system that was sufficient for our needs back then, but because company was performing good, we had to scale it up!
This is the final high level design
Now, let’s move a little deeper. Let’s review the requirements again:
We’ve already added upload, download, file sync, and notifications
Now, let’s discuss some deep dive parts.
We want to deep dive into upload and download, because these are the main features. However, to deep dive in them, we first need a couple more words about block storage.
So, we’ll deep dive into:
So, we’ve discussed block storage as a way to parallelize the flow. However, it’s so much more than that.
While with videos, we’ve split it into groups mainly for processing the video, and there are streaming protocols in place that deal for downloading it for us, we can apply similar pattern in here!
Imagine the following scenario:
Now, with block storage, it’s a little more complicated. We have blocks of data limited by size. That being said, we need to order them properly to show changes. And while we probably don’t have 12 blocks - one per tab - in excel, it gives a good idea what we do.
This is called Delta Sync. Basically, by splitting the upload and download into blocks, we can only update parts that need complete fetching. This significantly reduces bandwidth requirements and can lead to fetching file from an hour to minutes or seconds.
Another thing we’ll do is compression. We’ve already tackled video processing with compression algorithms before. But we don’t compress just videos. We can also compress images or text (e.g. gzip). For more about lossless compression, see Wikipedia.
So, what we’ll essentially do is:
The image above shows how compressed blocks are created and stored. And while the article uses Lempev-Ziv-Markov chain Algorithm (LZMA), we can compress the blocks with desired algorithm whether it is an image, video or other possibilities.
Now, if files are changed, we’d compress the blocks. We’d then compare the blocks with stored ones to see which changed. And those that changed would be the only ones uploaded (or downloaded).
One thing to bear in mind is consistency. It is not desirable to show to different clients different views (unless it’s being updated by both simultaneously).
A way to deal with this is making sure we invalidate the cache whenever a new block is received, so that next data load will return real data.
SQL databases follow atomicity, consistency, isolation and durability, or ACID for short. This enables high consistency and reliability. However, NoSQL databases don’t have it by design, and therefore we might need to do that ourselves if we opt to use NoSQL DB.
I’ve mentioned that we’re storing the blocks. While individual blocks are sent to S3 for faster speed, we need to be able to track it.
Furthermore, we store:
To view the relations:
There will likely be more relations, but this is sufficient for high level.
Now, let’s see what happens when a user starts uploading a file. First, consider the file does not exist:
uploadStatus
is pending
uploadStatus
is uploaded
Now, if it’s an existing file already:
The download is triggered when a file is added or edited somewhere. There are 2 ways a user downloads a file:
Now, once a client knows there are new files, it will simpluy request the metadata, and then download blocks to construct the file.
The notifications serve 2 purposes
As for the downloading of latest changes, we have 2 options. I’ve already mentioned long polling in the high level design.
Now, in chat service, I’ve mentioned that we have 3 options
The last 2 options work well. However, websockets are better suited for real-time communication, such as chats, and for bidirectional communication.
Here, it’s one directional - we are notified once about the changes, and we immediately pull them. Therefore, long polling is the way to go.
But even then, websockets would work. That’s something to keep in mind. It might be suited for real-time document updates, such as those on google docs.
Now, with storage space, we may be quite constrainted.
Therefore, we may want to make some improvements:
As is always the case with these applications, we want to have solid error handling
In this part, we’ve investigated google drive a little. Basically, we’ve: