Show your videos to the world
So, you want to allow your product to allow video upload, right? Well, it’s time we looked into it.
While this part talks about designing YouTube, we can also define it “Allow uploading and viewing stored videos”. So, it also applies to Netflix, Hulu, Disney and any other video sharing platform.
So, without further ado, let’s jump right into it!
We need 75 TB of storage daily. Furthermore, we very likely want to cache the videos. Content Delivery Network is great, but for videos, it’s a little more complicated. We can take a look at Amazon Cloudfront CDN Pricing to have an idea.
With 75TB daily storage required, we can estimate that our CDN would need to support 2250 TB, or 2.25PB monthly. After a couple months, we’d probably have even more. If we would take into account more than 4 PB monthly, the price per GB is 0.020 in US.
Now, let’s hop into high level design. So, let’s consider YouTube. What’s happening over there?
So, we basically have 3 different user flows:
In the very high level, it’ll look something like this:
So, what do we actually need to care about when we’re uploading a video? Well, if you’ve ever tried to create your own video, you probably have an idea:
So, uploading a video can look something like this:
Now, that’s quite fast! But don’t worry, we’ll have to deep dive on video coding later on. That’s probably the most complicated part.
Now, when a user uploaded a video and it’s either completed or still in progress, we may want to update the description or some other metadata. In essence, this part is just updating some metadata storage. A potential solution is shown below.
Finally, when a video is being viewed, we want to share it from CDN. See the image below to have an idea how it works.
Now, it’s not really just that simple. There are some parts wer need to understand:
Now, what it means is basically that the red and gray parts (played, ready to play) are already downloaded to device (in this case browser).
But that’s interesting! Because we can’t just send the whole video. There are a lot of details I don’t know either, but to have an idea, the protocols you can use are:
I’ve mentioned before that videos are stored as Blobs. But what is a blob? From Wikipedia:
A binary large object (BLOB or blob) is a collection of binary data stored as a single entity. Blobs are typically images, audio or other multimedia objects…
Now that’s helpful. Except it’s not, at least for me. So, I’ve dug around and found that some examples can be:
For more info about blobs, see Tokenex article
Finally, there are some storage types, such as Azure Blob Storage. The description may give us some more ideas:
Blob Storage is designed for:
- Serving images or documents directly to a browser.
- Storing files for distributed access.
- Streaming video and audio.
So, in short, blobs are multiple file types that we want to store and serve later on.
Now, that’s a really good start I feel. We have an idea of how to upload, update and view a video! So, let’s get deeper.
In this part, I’d like to focus on the following parts:
Now, if we’d take a look at YouTube, we can see a lot more things:
I won’t cover these here because it’d blow up A LOT. But, we could also have these services. If need refreshing, take a look at previous articles that covered this:
Video encoding is probably the most complicated and time consuming task in this chapter, so bear with me while I try to make it more approachable
So, where do we start? Well, I’ve decribed it already, but let’s take a look once again at what video encoding is, or why do we need it:
Now, encoding formats often have 2 parts:
So, the base line is - We need to do a lot of stuff! So where do we start?
Now the above is just a fancy word. Kind of. Directed Acyclic Graph, or DAG for short, is an important concept in graph theory and computer science. To avoid the mathematics stuff, the baseline is:
That’s a lot of fancy wording again. How will we use it here? Well, it’s basically a tool with which we can describe the system in a very solid way. It shows that some parts must be completed before another ones, and are good for describing relationships between data models.
It also shows which tasks must be executed sequentially and which can be parallelized. For example, we can start processing video, audio and metadata simultaneously, but we can’t work with the video until all have been completed.
Consider the following DAG for video encoding:
With this, we can immediately see that:
So, let’s take a look at it deeper, because this is a flow we’ll use!
Now, one thing to note is what I’ve already mentioned - multiple qualities. It is probable that we’d have to have multiple video encodings stored.
Consider that a user uploads his video of 5 mins in original size of 10 GB.
For the video rates described above, I’ve used TechAdvisor article
So, the final output of this will likely be something like:
bunny.mp4
bunny4_144p.mp4
bunny4_240p.mp4
bunny4_360p.mp4
bunny4_480p.mp4
bunny4_720p.mp4
bunny4_1080p.mp4
So, now that we have a general idea, we can continue into architecture of this video transcoding process!
This is the architecture we’re gonna use. Now, look at the picture and try to assign the parts to those we’ve discussed!
So, let’s look at the individual parts!
With Temporary Storage, we may want to have multiple depending on types. For example, metadata can be easily kept in memory, while video and audio would likely need a blob storage.
In Preprocessing, we’ll do a very interesting task. We’ll split the video into different parts.
So, what we’ll do is split the video into Group of Pictures. By doing so, we can speed it up
Finally, we’ll store all these items in temporary storage. In case one part breaks, we can resume from last failing point.
DAG Scheduler will basically define the graph in individiual tasks. For example, we may have 4 different tasks:
This scheduler basically splits these tasks into stages to make it clear what parts need to be done by individual tasks
Resource Manager manages resource allocation. How surprising!
Basically, we can have multiple queues here:
Once the task scheduler defines a task and worker to be processed, it’s forwarded to the task workers.
Task Workers are basically individual services performing the task. We can perform audio and video encoding simultaneously, just on different servers.
Finally, the output will be Encoded Video that can be in multiple formats.
Now that we know how we will process the individual videos, let’s perform some optimizations! Let’s start with video uploading!
Another thing we can do is storing upload centres close to users. If users would have to upload videos from Europe to US, it’d take very long time. So, we’ll use CDN as upload centres as well.
Now, let’s look at the complete thing we’ve designed above:
With this design, we have multiple modules:
Now, there are some problems with this. Specifically:
So, we’ll introduce a message queue!
By adding message queue between individual modules, we allowed for more decoupled system
Now, one of the things we want to make sure is only logged in users are allowed to upload a video. We often do this with authorization tokens.
However, in the case of our storage, we probably may be using something like Azure Blob Storage mentioned before. But that’d mean the user would have to upload to video to our servers, and then we’d send it somewhere next.
A thing we can use is presigned URL on AWS, or shared access signature on Azure.
They are pretty much the same thing - a temporary link directly to the storage. By doing so, a user would first request the temporary link, and then upload it to the storage directly.
Similarly, video protection can be done - we can encrypt the videos on our server so they can’t be read by unauthorized users. Furthermore, we can add watermarks of our company logo, or add digital rights management system (DRM) such as PlayReady for dealing with copyrighted material.
Finally, 2 parts to consider are cost savings. As mentioned on the start, CDN costs are 150k USD daily. That’s a lot to deal with every day.
What we can do is not push all videos to CDN. We could only store the most popular videos, or those with high view count. Low popularity videos will likely force us to spend more money than to gain in revenue. So, we can apply some analytics here and decide which videos we’d store in CDN.
If it’s necessary because our product is too popular, we could build our own CDN. Netflix already did that with internet service providers. See OpenConnect for more information.
Error handling is a really important topic here. We have a lot of parts in our system. We need to make sure we can recover gracefully.
So, we’ve gone through designing a video streaming service!
To recap a little, let’s see what we’ve done!
Now, there’s much more to it, and I don’t know how to fix everything, but potential thoughts can be: