Creating own tinyurl
So, this has been fruitful! So far we’ve gone through so much stuff! But before I’ll do a recap of what we’ve learned, let us go and actually design a tool!
So far, we’ve been talking about designs involving algorithms. So let’s go back a bit and design a complete system from scratch!
This time, we’ll be taking a look at a URL shortener, trying to reuse some of the topics we’ve seen so far.
So, have you ever wondered how to create your own tinyurl? You’ve come to the right place.
[0-9]
, [a-z]
and [A-Z]
So, we’ve defined a scope. Now, let’s try to estimate a little!
That’s for storage. Now, 100 millions URLs are created per day. That means 100 millions of writes.
To get QPS:
So, to recap:
Now, the URL needs to be as small as possible.
[0-9]
, [a-z]
and [A-Z]
From there, we can calculate the maximum size of the short URL.
So, we’re gonna do a URL shortener. Well, it’s basically 2 endpoints:
{shortUrl}
shorten endpoint:
The shorten endpoint will basically take a long URL that will be processed, and returns a short URL
longUrl: string
bodyshortUrl: string
And that’s it for this one. For the other endpoint, it’s more complicated
{shortUrl}
endpoint
This is the one where a lot of the magic happens to the user. Because what we need to do here is force a redirect.
Luckily, there are some HTTP Codes that allow for it:
Location
(in response headers)302 Moved Temporarily
Location
(in response headers)Now, the main difference between these two is that once 301 is returned the first time, it will no longer send the request to our shortening service directly. That’s what permanently means - it caches the last result of this call and returns it right away.
With the 302, the calls are still made to our service.
Again, it doesn’t mean that one is worse than another
Now, what if the shortUrl doesn’t exist yet? Well, we can just return 404 as no such shortUrl yet exists.
So, going deeper to the shortener, we’ll need to think a little about the shortening itself.
Now, there are a bunch of IDs that we can use for generating. In the previous chapter, we’ve mentioned UUIDs. But I’ve also tackled this topic here already.
Since the requirement is as short as possible, in the BOTE part, I’ve mentioned base 62 and 7 characters.
So, we’ll be using base 62. And how are we going to generate it? Well, we’ll have to use something. For simplicity, I’ll be using autoincrement in this section, so we can expect every new URL to be sent here to be incremented by a single ID. I also know that this isn’t the best approach as discussed in previous chapter. We could generate a new ID multiple ways and then convert that to base 62, but for simplicity, let’s not.
So, considering autoincrement, all new URLs to be generated will be increased by 1 in terms of ID. So how do we generate the ID for shortening service?
Well, we’ll convert it to base62:
1010
10
FF
Z
is 61
because Z
is 61 * 62^0
So, this is how we will shorten the URL.
Now, we know what endpoints we will have. How will they look on the inside?
{shortUrl}
endpoint
This one is fairly straightforward.
{shortUrl}
{longUrl}
So it’s basically a read operation! But - how will the write happen?
shorten endpoint:
We know that user sends us the URL he wants to shorten. So, what will need to happen is basically:
shorten()
-> save -> return shortUrlBut, what if the URL has already been shortened? Well, we need to check for it first!
shorten()
-> save -> return shortUrlAnd the shorten itself? Well, as described above:
id = count + 1
id
, shortUrl
and longUrl
to databaseSo, we have:
Now, what about the performance? Well, there are 2 issues here:
In distributed environment (which this definitely will be with 100 millions URLs being generated daily), we have to account for distributed ID generation
The second thing is getting fast responses. This would be solved by using content delivery network/edge servers, load balancer, multiple servers, as well as caching and database
So, in this chapter, we’ve gone through URL shortener
The things that would need more thought are again
Now, also we might want to think more about other things
We’ve to an extent discussed all the individual parts of this. This is some space for your own thoughts. Chances are, if you made it here, it won’t be wrong.