Creating facebook timeline
Congratulations on making it this far! This time, we’ll go back in time a little.
In the third section, we’ve already talked about a basic news feed.
Let’s try to revisit that this time once again with what we’ve learned!
A news feed publishing, similar to facebook timeline page.
So, again, let’s start with a high level design! We’ll have a bunch of services here!
It makes sense to start with publishing service since there would be no posts to retrieve without one!
An HTTP call to this service could be something like POST api/feed/create
. This would require authentication
and content.
Then, a news feed service which retrieves the posts. This could look like GET api/feed/me
to receive
data curated for the user who requests it. Again, auth token is required.
Now, since the two services are separate, how will the news feed service know about new posts? Well, the publishing service needs to handle that as well. Once a post is created, it’s saved into a database/cache of news feed service.
Now, that looks a little weird, right? The two services communicating weirdly. One for retrieving posts, while another one for saving them?
So, let’s redefine them!
Now, when a new post is created, it gets created in news feed service and saved in post DB.
Finally, we’ve discussed notifications last time. Let’s add a notification service in here as we know how to do it! Users can now be notified of new content!
Now, as usual, this looks like a distributed system. So, we’ll need:
Finally, we also want to store the information somewhere. That’d again be database and cache for fast retrieval. Note that the list is in reverse chronological order, so we could cache the latest published posts.
So, let’s identify the problematic parts.
So, the first 2 don’t need much more attention, but we can get better in the news feed service. Let’s try to improve
When a post is created, it is saved into post service. However, we also want to build the news feed for friends.
A social network is basically a graph, where nodes are users and edges are their relationships. There’s also a Graph DB that’s good for these use cases. So, for getting the relations, we’ll use GDB.
After we’ve fetched all the users to which we want to save the newly published post, we need to get their data. This will most likely be a user DB.
Finally, we will be pushing the latest posts into the cache. Now, pushing 1 post into 5000 friends can be slow. It gets even slower when we have a lot of users. Therefore, for publishing, we’ll use a message queue and multiple workers to handle that.
Lastly, this entire process is called Fanout. We’re basically delivering a message to all friends.
Now, there’s a potential problem in here. Imagine the following scenario:
So, we have essentially 2 options:
Fanout on write is where the above problem happens. The news feed is in real time and fetching it is fast because it is precomputed. However, if there are many friends, the generation can be slow, especially if there are inactive users.
Fanout on read is the opposite. The fanout basically happens when the data is read, meaning that it is on demand. For inactive users, resources are not wasted. However, fetching the news feed may be slow as it’s not precomputed.
Again, it depends on the use case. We could even adopt a hybrid approach:
Now, one of the most important part here is caching. There can be a lot of posts, historical or new, and a lot of friends.
We can have cache for the following:
userId, postId
Our final system could look like:
In this part, we’ve revisited a previous friend - news feed service.
We’ve again taken a look at the original design, but described the concepts a little more and enhanced it.
Hopefully this trip down the memory lane was useful for you - it certainly was for me!