System Design 9 - Notification System

Sending notifications to users

Introduction

One of the things we often see nowadays are notifications. Think about your phone. How many times in the last 24 hours has it beeped because you received a message on a social network? Or how many times you’ve received an email that something is on sale.

Ever wondered how these are done? Well, wonder no more, and dive with me into designing a notification system!

What are we designing

Again, as is the case, let’s set the scope:

  • The system supports email, sms, and mobile notifications
  • Notifications should be delivered to user as soon as possible
  • Available devices are iOS and android devices, as well as laptop/desktop
  • Notifications are triggered by a client action, but can potentially be also defined elsewhere
  • Users can opt-out
  • 10 million mobile push notifications, 1 million SMS, and 5 million mails are triggered daily

So, what does the above notification tell us?

  • Different types of notifications
  • Sending is triggered by multiple actions
  • Contact info needs to be gathered (SMS, emails)

Mobile Notifications

Mobile notifications, also called iOS push notifications (or android push notifications) are those that you can see in your mobile when you receive a message. Each device handles them separately, but all are sent by a centralized store.

iOS Push Notifications

In the case of iOS, we need to set up a server allowing requests to Apple Push Notification service, or APN for short.

How it works is we basically have 3 parts:

  • Provider (our server, or others)
  • APN
  • iOS device

Now, with this flow, in short, we basically:

  • Trigger a request to APN on a server with action we want to do
  • APN will pass the notification with required data to the client
  • The iOS device will receive a push notification

The important thing that we need to store is a device token. It is an unique identifier of the device. This is also generated depending on device system

Android Push Notifications

The push notifications on android are the same as on iOS. The only difference being who manages the messaging.

Rather than APNs, Firebase Cloud Messaging is often used as the provider.

SMS

With SMS, the flow is still the same as previously, only provider and device is different.

In here, we would again use a third party provider, e.g. Twilio. We also don’t require device token as phone number is our identifier.

Email

Finally, emails are just as easy as SMS. The difference being provider (instead of SMS provider, we would use Mailchimp), and the identifier is email.

Contact Gathering

So, as we can see, the flow is basically:

  • Provider (our server)
  • Third Party Provider (APN, FCM, Twilio, Mailchimp)
  • Identifier (device token, phone number, email address)

So, what we need to do is gather the contacts. How would we do that?

Well, in the case of mobile apps, we could just create the tokens and send them to our web servers the moment our app is installed. However, with phones/emails, it’s more complicated.

Now, technologically, it’s easy to retrieve phones and emails. However, we need user to insert it themselves. Because of that, we’d most likely want to ask for these during sign up into application.

Keep in mind that in order to send the notifications to specific users, we want to keep a link to them. Imagine the following:

  • We have a user that has not registered and we want to remind him of that
  • We have a user that he has pending payments in his application to finish

Both of these will have a different flow. For the first one, we don’t keep track of user ID, because there is none. We’d just send a reminder to a device that has no user ID.

However, in the latter, we want to send the info to specific user. But how would we know about it? Well, we need to link them!

So, in order to send the notifications properly, we need to store them:

  • user table with ID, phone and email
  • device table with device token, user ID

Sending Notifications

So, in the high level design, we’d likely end up with the following design:

  • A service layer, where 1 (or many) service(s) can trigger notification
  • The notification system itself, who is a coordinator to third party services
  • Third party services handling requests to specific devices
  • Devices receiving their notifications

Now, that’s basically our basic system design! Now, let’s identify a couple flaws!

  • A single notification system means a single point of failure
    • We can make the system distributed
    • We’ll also add caching and databases rather than just forwarding
    • Distributed system means other issues, such as consistent hashing or distributed ID generation
  • The individual third party services handle requests. However, what if there’s too much load? They may get lost
    • We will add message queues so they are processed over time and not lost
    • We can also add retries

Finally, the flow will be like:

  • A service (marketing, operations, …) calls API of notification servers
    • The request will have something like userId, senderEmail, title, body
  • Notification servers retrieve the data from database about the user
  • The servers pass the notifications to a message queue
    • Depending on user, notification can be sent to one or multiple devices
  • The third party services perform the notification.
    • In case of failure, it’ll be added back to the message queue
  • Device receives notification

Going forward

Now, we have a fairly robust system. We have multiple services connecting to a distributed service for managing notifications.

Furthermore, these services use message queues to allow for retry mechanism and managing high load.

So, what’s next? It sounds quite robust, right?

Well, there’s a bunch of things to consider. For example:

  • What if the third party fails to send the message?
  • What if our server is overloaded with requests
  • We want users to be able to opt-out of notifications?

So, one thing we can add is a notification log. This log would basically write down every notification that was sent to APN/FCM/Mailchimp/Twilio (or any other provider). By logging all notifications, we’ll be able to tell exactly which notifications have been delivered.

To prevent overloading our server, we would add well known load balancers. Furthermore, we never touched authentication by service identifiers. We shoud also add that so not everyone can trigger the notifications. Finally, rate limiting will be our friend here as well.

As for notification settings, since user can opt out, we should keep in database if user wants to receive them or not. We can go event beyond and define types of notifications (such as those you see on LinkedIn settings).

We could also make a notification template ourselves so we have some prefilled content.

Finally, as is always required - monitoring and tracking.

  • With the design we have, we can see that a notification has multiple states:
    • Start - the moment notification is received on our notification server
    • Pending - A state in message queues
    • Sent - When notification is sent to third party provider
    • Error - When notification was sent due to an error.
      • The first error then retries the notification mechanism
    • Deliver - Notification received on device
    • Click / Unsubscribe - Final state performed by user

And finally, monitoring. No system is finished until you have sufficient monitoring where you can easily identify what is going on, if you need to add servers, or if there is any other issue.

Summary

In this part, we’ve gone through a mind map of a notification system.

  • We’ve created a distributed notification system with multiple servers
  • Many services can connect to this system, as long as they are authorized and not rate limited
  • We’ve set up our storage so that we can identify contact information and send notification properly
  • We’ve also set up storage to keep what notifications users want to keep receiving
  • The notification is handled by third party providers, such as APN, FCM, Mailchimp or Twilio
  • We’ve added retry mechanism, template with prefilled static content,

References

SimProch logo