Suffix automaton is a robust data structure that allows you to solve complex string-related problems such as: checking the presence of a substring in a string, counting the number of total distinct substrings, finding substring, and many others. In this article, I cover the suffix automaton algorithm, provide implementation, and finally create the correct rickroll lyrics automaton.
First of all
Now we can continue.
There is a meme that I've seen a couple of times with all possible Never Gonna Give You Up central lines. It's nice, but it's not fully correct.
The problem is that it conforms to incorrect lines too:
- Never gonna give you cry
- Never gonna tell a lie and desert you down
- Never gonna make you up
- Never gonna give you down
- Never gonna make you never
And many others. So, we can conclude that this graph is incorrect as incorrect lyrics must be unreachable. Then, we need to correct this immense mistake against humanity and generate the correct automaton for Rickroll lyrics.
What is the suffix automaton?
Intuitively, it's a data structure that contains information about all substrings of a string and stores it in compressed form. More specifically, it's a directed acyclic word graph in which each node is a state and all edges are transitions between these states by some letter.
Each state corresponds to some substring in the initial string. There is also one start state and some states are marked as terminal. We also require that suffix automaton contains the minimal possible number of states.
If a substring is not presented in the text, then this state will be unreachable. There's simply no state or for an absent substring. So, at some point we will need transition that does not exists.
Here is the example of suffix automaton for string
The leftmost state corresponds to empty string (start state) and the rightmost corresponds to the whole string (terminal). Notice that if you start from the start and somehow end up in the terminal state, then the path you followed corresponds to some suffix of the string. Also, every substring corresponds to one path from the start.
Rickroll suffix automate
For this, I generated suffix automate for every line and then merged these suffix automates.
Even though this graph is not as nice as presented in the meme, it's correct. You can explore the graph above by yourself; it's actually fun.
In the next post, I will discuss how I have built this graph using the suffix automaton. Subscribe so you do not miss it!