Treap: The Easiest Search Tree (Explained)

Cartesian tree or treap (binary search tree + binary heap) is a fast yet simple data structure. It conforms to a core search binary tree property and binary heap property at the same time. Despite its simplicity, treap self-balances, resulting in O(logn) complexity on average for all common operations.

Amazing, right?

💥

The algorithm uses random values. Therefore, O(logn) complexity is on average. However, with a lot of items O(logn) is almost always true. So, later in this article, I will use just O(logn) without "on average" addition.

Moreover, there is a modification (implicit treap, treap with implicit key) that lets you use treap as a usual array with O(logn) random insertions and random deletions. Isn't it cool? In this article I'll explain how to create one and provide the implementation in Swift. Also, I will compare treap to the general set from standard library. Let's start!

💡

In a binary search tree, for each node, all items' values in the left subtree are less than the node's value, and all items in the right subtree are greater

Core algorithm

As I said earlier, treap combines heaps and binary search trees. Therefore, we are going to store at least two properties: key (or value) and priority. Key is a value for which tree is a search tree and for the priority, it is a binary heap.

💡

A binary heap is a binary tree where each node child's value is less than the node's value

On the image above, you may notice that for every node, all child's priorities are less. On the other side, all children on the left have a key less than that in the node, and all children on the right have a larger key.

💡

It's also called a cartesian tree as it can be displayed on a regular 2D grid with (key, priority) coordinate for each node. Just like in the image above.

To create a fully-functioning search tree, we need to implement:

find
insert
remove

More exotic operations like lower bound and upper bound are also pretty simple and does not differ from those in the other search trees. And all these operations can be implemented using just two helper operations!

How to do that?

💥

Split

Splits the tree into two trees by given value. All values in the left tree are less than the value while in the right tree are greater. And both resulting trees are correct treaps.

We will use a special flag that decides whether to send values that are equal to the left tree or to the right tree.

Example of split function result. The equal value sent to the right

💥

Merge

Merges two treaps into one big treap.
Prerequisite: all items in the first tree are less than items in the right tree.

So, if we implement these two methods, implementing all other three operations would be trivial.

Split

Let's start thinking about code at this stage. I'm going to explain this in C++. Rewriting the following code in Swift is actually really easy. Leave a comment bellow if you need a help.

template<typename T>
struct Node {
	T key;
	size_t prior;
	Node* left = nullptr, *right = nullptr;

	Node(T key, size_t prior) :
		key(std::move(key)),
		prior(prior) {
	}
};

Structure of treap's node

For split, we have a head node and a key for which split needs to be done. This method is extremely simple using recursion.

Algorithm

Let the current head be p.

If p->key is less than the key, then we need to go right and split p->right further.

Also, splitting right will bring two trees as well, and the first one will have nodes with keys less than the key. Yet, they are greater than the p->key (as they are in the second tree of the first split).
So, we set p->right to the first tree of splitting right result.

Result: p, split right's second tree
If the p->key is greater or equal to the key, then we need to go left and split p->left further.

Similarly to the case above, we set p->left to the second tree of split left.

Result: split left's first tree, p

The algorithm above leaves a node that is equal to the split value in the second tree. Symmetrically, we will use the equalOnTheLeft flag to leave the node in the left tree.

So, the final code:

pair<Node *, Node *> split (Node *p, const T& key,
				bool equalOnTheLeft=false) {
    if (!p) // reached leaf
    	return {nullptr, nullptr};
    if (p->key < key ||
    	(equalOnTheLeft && p->key == key)) { // splitting right
        auto q = split(p->right, key, equalOnTheLeft);
        
        // q.first has nodes of the right
        // subtree that are less than key
        p->right = q.first; 
        
        return {p, q.second};
    } else { // splitting left
        auto q = split(p->left, key, equalOnTheLeft);
        
        // q.second has nodes of the left 
        // subtree that are greater or equal
        // to the key
        p->left = q.second;
        
        return {q.first, p};
	}
}

💡

Priorities are not used and not changed during the split procedure. The resulting trees have the right order of priorities as the initial tree had it right

Subscribe and don't miss posts!

Merge

Merge is similar to split, but it uses priorities to do the work. As I mentioned before, there is a prerequisite: all items in the first merged tree must be less than items in the second tree. If this is not true, another algorithm must be used.

Algorithm

Similarly to split, merge is also recursive. Let us have two trees to merge: l and r.

We need to choose which tree will represent the new head. That's simple — the head must have the greatest priority, so we choose l or r based on that.

💡

Notice that the head node in l has the highest priority in the whole l tree as its a property of correct treap. The same applies to r.

If l has greater priority, then l->left subtree will remain intact as left subtree for sure less than r and it has nothing to do with it.

Then, l->right subtree must be merged with r and it's going to be the new l->right subtree.
If r has greater priority, then, similar to the example above, r->right will remain intact and r->left must be merged with l

Node* merge (Node *l, Node *r) {
    if (!l) // left is empty
    	return r;
    if (!r) // right is empty
    	return l;
        
    if (l->prior > r->prior) { // l has the new head.
        l->right = merge(l->right, r);
        return l;
    } else { // r has the new head.
        r->left = merge(l, r->left);
        return r;
    }
}

Why is it correct?

It seems like nothing stops us from breaking the search tree structure where all items' values in the left subtree are less than the node's value, and all items in the right subtree are greater.

💡

Prerequisite saves binary search tree property as items are never reordered and l < r property is always kept the same

Implementing search tree methods

You believed me that all methods are easy to implement through split and merge. Time to prove that.

Find

Find is implemented just like for the general search tree. We use the fact that keys in the left subtree are greater than the value in the node.

Node* find(Node* node, const T& key) {
	if (node == nullptr)
		return nullptr;
    if (node->key == key)
		return node;
    return find(key >= node->key ? node->right : node->left, key);
}

Insert

Let's think about insert in terms of split and merge. We have one big tree and we need to insert a new key.

Split the tree by key to new trees: first and second. Then, we will have two trees: the first (which has values lower than the key) and the second (which has values greater or equal to the key).

We can check that node already exists: try to find it in the right tree.

💥

Implementation requires that each item is met only once.

If you need to insert multiple copies of the same item, you can store an item and it's count to achieve that

Create a new node that will store the new key — newNode. Ta-da this node is a correct treap that has only one node.

For the new node, you need to set a random priority

💥

Random priorities are key to the complexity. This makes the cartesian tree balance itself, making O(logn) complexity for all operations

New head will be merge(first, merge(newNode, second))

See? It's that simple.

Node* insert(Node* head, T key) {
    auto split = split(head, key);
    if (find(split.second, key) != nullptr) {
    	// Key exists already
        // Merge back
        return merge(split.first, split.second);
    }
    
    auto newNode = new Node(std::move(key), rand());
    return merge(split.first, merge(newNode, splitsplitted.second));
}

Remove

It's very similar to insert. However, that's where the equalOnTheLeft flag is used.

💡

Remember that the second tree produced by split contains items greater or equal to the selected key

Therefore, the second tree will contain the value that needs to be removed. But how to remove it from the tree?

Split again.

We can split the second tree by key, setting the equalOnTheLeft flag to true. Thus, the node will be separated from the second tree to the new tree.

💡

After conducting two splits and separating deleted node, unneded node is easely removed everything else is merged.

Node *remove(Node *head, const T &key) {
    auto split = split(head, key);
    if (split.second) {
        auto secondSplit = split(split.second, key,
                                     /*equalOnTheLeft=*/true);
        // Key exists, so delete it and merge
        auto everythingElse = secondSplit.second;
        if (secondSplit.first == nullptr) {
            // There's no element equal to key. Merge back.
            return merge(split.first, everythingElse);
        }

        // We got node with key value in
        // secondSplit.first
        delete secondSplit.first;

        size--;
        return merge(split.first, everythingElse);
    }
    // Key is not presented. Merge back.
    return merge(split.first, split.second);
}

Full code

You can download C++ code of a little bit optimized Treap here:

Treap

C++ code allocations-optimised

treap.h

5 KB

Comparing to `std::set`

First of all, the implemented version of treap utilizes split and merge methods. Note that there is more efficient implementation that uses rotations. However, the true power of treap is in split and merge methods as other search trees can't do it easily.

Find tests

It's visible that asymptotics is similar. Though, treap always has the greater overhead. Still, it's a good result! We're competing with an utterly optimized standard library data structure.

Inserts

Insert operation test

Insertion has even bigger overhead. And it was expected: recursive calls of merge and split do not improve performance ;)

TreapProject

Comparisons tests. Outputs CSV of time measurments

TreapProject.zip

7 KB

Comparison conclusion

As you see, treap has higher nodes' height on average than that of very well-balanced AVL tree.

Yes, treap has worse performance than that of std::set. Yet, the results are comparable, and with a large data size, treap gets closer and closer to std::set which in fact is a red and black tree.

Believe me, you don't want to write your own RB tree. It's a nightmare.

Use cases and modifications

We developed this data structure not just to lose std::set. There are several useful applications.

Sum of numbers in the interval

We need to modify Node structure, adding sum field. It will store sum of all its children and itself.

template<typename T>
struct Node {
	T key;
	size_t prior;
	long long sum;
	Node* left = nullptr, *right = nullptr;

	Node(T key, size_t prior) :
		key(std::move(key)),
		prior(prior) {
	}
};

It's extremely easy to update the sum. Every time childs are changed, sum = left->sum + right->sum. So, you can implement some kind of update function and call it in split and merge right before returning value. That's it.

How to answer on request?

We receive interval [l, r]. To calculate the sum of numbers on this interval, we can split the tree by l, then split the second tree of the result by r+1 (or by r, leaving equal elements on the left). In the end, we will have a tree containing all added numbers in the interval [l, r].

Complexity: O(logn) versus O(n) naive.

Using a hash of value in place of priority

You can use a hash of value as a priority as a good hash function is pretty random. What benefits does it bring?

If keys and priorities are fixed, then no matter how you construct the treap or add elements, it's always going to have the same structure.

💡

You may think about it this way: keys fix x axis and priorities fix y axis of treap

Therefore, you can compare two sets in O(n) as treaps containing the same values will have absolutely the same structure.

Implicit treap

What if we use the size of the left subtree as a key? Then, we can use this key as an index. Wow. That means that we can represent a regular ordered array as a treap!

By doing this, we can:

make insertions by random index
O(logn) versus O(n) naive
make deletions by random index
O(logn) versus O(n) naive

With great power, comes great responsibility.

😡

Access by random index downgrades to O(logn) versus O(1) in the standard array.

If your algorithm requires a lot of array modifications and very few accesses/outputs, then it's the right choice. Moreover, you can convert treap into an array and back with O(n) complexity.

💥

I have implicit treap implemented in Swift. It behaves just like the general array and implements a lot of optimisations. Check it out!

Cut-paste problem

Imagine that you have a big string and you recieve requests to cut some part and to insert it somwhere.

This problem can be solved using treaps with implicit key. You can use splits to cut needed part and merge to insert it.

FAQ

Cartesian trees are most suitable for what?

Treap is useful when you need to collect some kind of characteristic on an interval (for example, sum) or apply some modification to the interval. Treap with implicit key is also useful when you need to apply a lot of random tree insertions/deletions with few accesses.

Why don't we use array indices as keys for an implicit treap?

Because in case of insertion we would need to recalculate all indeces that are higher than inserted index. Therefore, it downgrades complexity to O(n).

Is treap a randomized tree?

Yes, it is. But it can also use hash value in place of a random value.

I know about implementation without split and merge. It utilizes left and right turns. Is it better?

For example, GeeksforGeeks use such implementation, I know. But I believe that the true value of treap is in seampless splits and merges. You've already seen by examples how it is really usefull. Why implementing treap with turns when you can build AVL that's probably going to be faster?

Love data structures?

Check out my article on the amazing Skip List! While a lot of people never heard about it, Skip List is beautiful and can solve, for example, the problem of finding the n-th maximum or the rolling median problem in the most efficient way.

Also, you can check the whole algorithms section of my blog

References

https://www.cs.cmu.edu/~scandal/papers/treaps-spaa98.pdf

Subscribe to Alex Dremov

Get the email newsletter and receive valuable tips to bump up your professional skills

Recommended for you

Machine Learning

Understanding Flash Attention: Writing the Algorithm from Scratch in Triton

Why is Flash Attention so fast? Find out how Flash Attention works. Afterward, we'll polish our understanding by writing a GPU kernel of the algorithm in Triton.

6 months ago • 10 min read

Machine Learning

Speed Up PyTorch With Custom Kernels. But It Gets Progressively Darker

It's all about making your models run faster, from flicking a magic “compile” switch to writing your own custom GPU code. In each step, we’ll implement an innocent softmax function, but things are about to get dark by the end.

6 months ago • 7 min read

Machine Learning

Simple Ways to Speed Up Your PyTorch Model Training

If all machine learning engineers want one thing, it's faster model training — maybe after good test metrics.

a year ago • 12 min read

Treap: The Easiest Search Tree (Explained)

Alex Dremov

Core algorithm

Split

Algorithm

Merge

Algorithm

Implementing search tree methods

Find

Insert

Remove

Full code

Comparing to std::set

Find tests

Inserts

Comparison conclusion

Use cases and modifications

Sum of numbers in the interval

Using a hash of value in place of priority

Implicit treap

Cut-paste problem

FAQ

Love data structures?

References

Share

Subscribe to Alex Dremov

Get the email newsletter and receive valuable tips to bump up your professional skills

Recommended for you

Understanding Flash Attention: Writing the Algorithm from Scratch in Triton

Speed Up PyTorch With Custom Kernels. But It Gets Progressively Darker

Simple Ways to Speed Up Your PyTorch Model Training

Subscribe to Alex Dremov

Get the email newsletter and receive valuable tips to bump up your professional skills

Comparing to `std::set`