System Design 11 - News Feed

Creating facebook timeline

Introduction

Congratulations on making it this far! This time, we’ll go back in time a little.

In the third section, we’ve already talked about a basic news feed.

Let’s try to revisit that this time once again with what we’ve learned!

Scope

A news feed publishing, similar to facebook timeline page.

Mobile & Web app
10 millions daily active users
Can contain text, images and videos simultaneously
The important feature is that user can make a post and see posts of their friends
A user can have 5000 friends
Posts are visible in reverse chronological order
App is available worldwide

High Level Design

So, again, let’s start with a high level design! We’ll have a bunch of services here!

It makes sense to start with publishing service since there would be no posts to retrieve without one! An HTTP call to this service could be something like POST api/feed/create. This would require authentication and content.

Then, a news feed service which retrieves the posts. This could look like GET api/feed/me to receive data curated for the user who requests it. Again, auth token is required.

Now, since the two services are separate, how will the news feed service know about new posts? Well, the publishing service needs to handle that as well. Once a post is created, it’s saved into a database/cache of news feed service.

Now, that looks a little weird, right? The two services communicating weirdly. One for retrieving posts, while another one for saving them?

So, let’s redefine them!

Post service, which contains all posts that have been posted, and has a cache and DB of posts
News feed service, which is responsible for creating and retrieving them

Now, when a new post is created, it gets created in news feed service and saved in post DB.

Finally, we’ve discussed notifications last time. Let’s add a notification service in here as we know how to do it! Users can now be notified of new content!

Now, as usual, this looks like a distributed system. So, we’ll need:

Load balancers for multiple servers
Rate limiting to not be overwhelmed
Authentication as we’ve mentioned we are using it
Monitoring and analytics

Finally, we also want to store the information somewhere. That’d again be database and cache for fast retrieval. Note that the list is in reverse chronological order, so we could cache the latest published posts.

Deep dive into design

So, let’s identify the problematic parts.

Post service is just for storing the posts
Notification service is just for pushing notifications
The most work here is done by the news feed itself. Why?
- Creates the posts and saves them to post service
- Furthermore, it creates the feed for friends list
- And because it creates the feed for all friends, it also pushes the post to notification service

So, the first 2 don’t need much more attention, but we can get better in the news feed service. Let’s try to improve

When a post is created, it is saved into post service. However, we also want to build the news feed for friends.

A social network is basically a graph, where nodes are users and edges are their relationships. There’s also a Graph DB that’s good for these use cases. So, for getting the relations, we’ll use GDB.

After we’ve fetched all the users to which we want to save the newly published post, we need to get their data. This will most likely be a user DB.

Finally, we will be pushing the latest posts into the cache. Now, pushing 1 post into 5000 friends can be slow. It gets even slower when we have a lot of users. Therefore, for publishing, we’ll use a message queue and multiple workers to handle that.

Lastly, this entire process is called Fanout. We’re basically delivering a message to all friends.

Now, there’s a potential problem in here. Imagine the following scenario:

A user has 5000 friends
80 % of these users are inactive
We’re effectivelly pushing every post to users that won’t read it.

So, we have essentially 2 options:

Fanout on write is where the above problem happens. The news feed is in real time and fetching it is fast because it is precomputed. However, if there are many friends, the generation can be slow, especially if there are inactive users.

Fanout on read is the opposite. The fanout basically happens when the data is read, meaning that it is on demand. For inactive users, resources are not wasted. However, fetching the news feed may be slow as it’s not precomputed.

Again, it depends on the use case. We could even adopt a hybrid approach:

Users that have many followers will have content on demand
Users with less followers will have real-time content

Caching

Now, one of the most important part here is caching. There can be a lot of posts, historical or new, and a lot of friends.

We can have cache for the following:

News feed - a cache of IDs userId, postId
Content - Stores all post data. Popular content is stored in separate hot cache
Social Graph - Stores relations between users (follower/following)
Actions - Stores info about interactions (likes, replies, shares, …)
Counters - Like counter for example. We don’t need to show all likes immediately, only counters.

Our final system could look like:

Summary

In this part, we’ve revisited a previous friend - news feed service.

We’ve again taken a look at the original design, but described the concepts a little more and enhanced it.

Hopefully this trip down the memory lane was useful for you - it certainly was for me!