Please trie hard
The second to last part will be dedicated to Tries. In this part, I’ll trie to explain what are their benefits, and why should we care about them.
Trie is a very specific type of tree that you’ve definitely encountered in some form. Why? Well, not because you’ve worked with it, no. But because you’ve used something that is built on top of it. Specifically, Google. Or any autocomplete really.
Have you ever wondered how autocorrect or autocomplete works? Me neither! Nevertheless, I’ll shove it up your brain so you will know what’s in your phone!
A Trie is a tree structure ideal for text-based features. Autocomplete and autocorrect are great examples of it.
In contrast to other trees, Trie doesn’t have a root. Or rather - the root doesn’t hold a value.
After talking a little about trees already, let’s take a look at a couple words and see how we could make a tree out of it:
Now, what can we do with the words? Well, not much actually. Because tries are more about the characters. Now, if you split it into characters, you may see where I am going:
And from the individual characters, we could build a tree. It’s not complete, or balanced. But it certainly is a tree!
But how would that help us? Well, if we consider google, consider this feature:
Now, that’s great! But how would we know where to filter? We could send this to DB and fetch:
select * from autocomplete where query like '{myQuery}%' order by asc
But that would be searching across all queries, not to mention the system design issues if you have multiple databases.
We could use a trie for that instead.
Now, as for the top 10 most searched, we could keep these cached:
But I’m getting ahead of myself. It’s better described in the following image:
Note that each leaf has a value that’s associated with it. We could have this value cached on individual nodes level:
Anyways, let’s build a Trie!
Trie has the following operations in the simplest form:
There is no deletion or traversing, because they don’t make sense. We don’t want to remove previously created words, neither do we want to go through all of them.
class TrieNode {
children = {};
}
class Trie {
rootNode = new TrieNode();
search(word) {
let currentNode = this.rootNode;
for (let i = 0; i < word.length; i++) {
const char = word[i];
if (currentNode.children[char]) currentNode = currentNode.children[char];
else return null;
};
return currentNode;
}
insert(word) {
let currentNode = this.rootNode;
for (let i = 0; i < word.length; i++) {
const char = word[i];
if (currentNode.children[char]) currentNode = currentNode.children[char];
else {
currentNode.children[char] = new TrieNode();
currentNode = currentNode.children[char];
}
};
currentNode.children["*"] = new TrieNode();
return currentNode;
}
}
Now, if you look at the insertion, there’s one more thing - ”*“. That’s because it means there can be any words after this. We don’t keep the entire dictionary somewhere, and new words are always being made up. That basically implies it.
And we’re done with Trie. Kind of. We’ve created all the operations. Now, everything is depending on the use cases.
I’ve mentioned on the start that it’s good for text-based features. Let’s create autocomplete together.
Autocomplete with Tries is fairly simple:
So, consider the following:
const trie = new Trie()
trie.insert('cat');
trie.insert('catnip');
trie.insert('catnap');
trie.insert('ace');
trie.insert('act');
We have a trie that has some words in it. Now, let’s consider we want an autocomplete:
ca
, we’ll receive an array of cat, catnip, catnap
ac
, we’ll receive an array of ace, act
So, what we need to do is create a function that retrieves these words.
function getAllPossibleWords(node = null, word = "", words = []) {
const currentNode = node || this.root;
const children = Object.keys(currentNode.children);
children.forEach((key) => {
if (key == "*") words.push(word);
else {
this.getAllPossibleWords(currentNode.children[key], word + key, words);
}
});
return words;
}
Now, here we are using recursion on the Trie.
And now, autocomplete is simple. We’ll just combine functions we already have:
So, it’ll look like this:
function autocomplete(word) {
const node = this.search(word);
if (!node) return null;
return this.getAllPossibleWords(node, word);
}
And that’s it for autocomplete!
Autocorrect will be pretty much the same thing, except:
So, to do so, we could do:
function autocorrect(toCorrect) {
let node = this.search(toCorrect);
if (node) return toCorrect;
while (!node) {
toCorrect = toCorrect.slice(0, toCorrect.length - 1);
node = this.search(toCorrect);
}
const words = this.getAvailableWords(node, toCorrect);
return words[0];
}
With the above function, we’ll search if the word we want to correct exists. If not, we’ll progressively move to the top.
This will be result in catp
being shortened to cat
, searched for, and then returned!
The code written here in total is:
class TrieNode {
children = {};
}
class Trie {
root;
constructor() {
this.root = new TrieNode();
}
search(word) {
let node = this.root;
for (let i = 0; i < word.length; i++) {
const char = word[i];
if (node.children[char]) node = node.children[char];
else return null;
}
return node;
}
insert(word) {
let node = this.root;
for (let i = 0; i < word.length; i++) {
const char = word[i];
if (!node.children[char]) node.children[char] = new TrieNode();
node = node.children[char];
}
node.children["*"] = new TrieNode();
console.log(node.children);
return node;
}
autocomplete(word) {
const node = this.search(word);
if (!node) return null;
return this.getAllPossibleWords(node, word);
}
getAllPossibleWords(node = null, word = "", words = []) {
const currentNode = node || this.root;
const children = Object.keys(currentNode.children);
children.forEach((key) => {
if (key == "*") words.push(word);
else {
this.getAllPossibleWords(currentNode.children[key], word + key, words);
}
});
return words;
}
}
const trie = new Trie();
trie.insert("cat");
trie.insert("catnip");
trie.insert("catnap");
trie.insert("ace");
trie.insert("act");
trie.autocomplete("ac");
Feel free to play around with it!
In this piece, I’ve gone through tries. A third data structure we’ve discovered after binary search trees and heaps.
Tries are very solid data structure that can be used in many text-based features.