The Ultimate Tree


The Ultimate Tree is the fastest Tree data structure ever built.

Trees are one of the most powerful data structures in Computer Science. They are fast. They allow you to have a well-structured, total ordered view of your data with fast retrieval (search), insertion and deletion. There are many different types of Trees, and when I say fast, what I mean is O(Log(n)), which IS fast. Check this out:
 
As you can see, all sophisticated trees above have retrieval, insertion and deletion in O(Log(n)). The question then becomes: is it possible to come up with The Ultimate Tree, a Tree capable of search, insertion and deletion in constant time (O(1))? The claim in this post is that yes, there IS a way to do that. Here is how it will work:
1)      First, no free lunch here! Such a massive speed up in search, insertion and deletion will require extra space, hence we’ll not be optimizing for space here, and will tolerate up to O(n)-space. Space is cheap. Most of the time.
2)      Second, the key idea behind it is to manipulate the tree via a second data structure, a data structure that I’ll call the Tree Manager (which we’ll refer to from now on as TM). A TM is a quick lookup-table to the elements of the actual Tree. It contains a hash table where the key is a unique identifier to each node of the tree, and the value is a pointer to the actual node in the tree. With no lack of generalization, we’ll assume that each element of the tree is unique. Any of the operations search, insert and delete will be done thru the TM.
3)      Still within the context of the TM, the search and insertion are straightforward operations: a search simply returns the cached value in the Hash table. The insertion takes as input the parent value and the child value and inserts it there. Each node in the tree represents its children as yet another Hash table for quick look-up. The drawback is that we lose the concept of order for any node’s children, but that’s OK (really, if you have a node with many children, do you really care about the order of these children?).
4)      The deletion is the tricky part. See, all the trees shown above deal with deletion in very complex ways, with tricky rotations, translations, special cases, and so on. In order to achieve O(1), what we decided to do here is not really delete the node – but flag it as gone. Node is still there, but invisible. The implications of this approach are the following:
a.       Memory is wasted. The node is still hanging in the tree, but invisible. Again, we’re not that much concerned about space here.
b.      This approach will put a significant pressure on the internal Tree methods in order to take into account the invisible nodes. Even the most basic method, like calculating the height of a node (height being defined as the max distance between the node and any of its descending leaves) will have an extra layer of complexity to deal with the invisible nodes – code complexity, yes, but keeping the overall time complexity the same. Another illustration of this complexity is the Print-Nodes-By-Level method. This is usually accomplished with a simple Depth-First Search (DFS), no recursions involved, but in order to deal with the invisible nodes, the DFS becomes significant more convoluted: still a DFS but now using a Priority Queue and a recursive call to populate the Priority Queue. Getting closer to a Dijkstra's Algorithm.
c.       Finally, if we want to eventually de-allocate the memory held by the invisible nodes, a process will have to remove these nodes and reconstruct the tree later on. This can be done sporadically, similar to a garbage collection algorithm. Depending on the application in which the tree will be used, I consider this step as optional.
The picture below illustrates how the TM would interact with our Tree class:
 

The new time complexity then becomes:



Hey, guess what, AnotherCasualCoder is now live on Github!!! Code is there, check it out:
Happy Mother's Day!!!
Marcelo

 

Comments

  1. I really enjoyed reading the blog. How you have applied a variety of data structures to optimize the time complexity yet retain the property why a tree is used for. Awesome application of simple concepts to build something useful and applicable in real world day to day problems we solve at work. Thanks for this. I wanted to just point one one thing about the worst case time complexity as the above solution uses Hashtables at two levels and hashtable itself has worst time complexity of O(n) for search, insert and delete. That might eventually mean the worst case time complexity of the above data structure as well becomes O(n) instead of O(1), but yes that would only be a case when the struture is too large and needs a rehash or for whatever reason it encounters collisions and uses daisy chaining or open addressing to resolve it. Again thanks for the article as it helped me think how I can apply several data structures to come up with out of the box solutions that otherwise does not exist!

    ReplyDelete
  2. Ah, good point we often think about hash tables as O(1)-time, but we also often forget about the possibility of collisions which do create a problem. However given the constraints especially the uniqueness of the keys, this is likely to not be an issue. The caveat with my approach is that the caller is responsible for keeping the tree balanced (if she wishes so) whereas the other methods do so already.

    ReplyDelete

Post a Comment

Popular posts from this blog

Advent of Code - Day 6, 2024: BFS and FSM

Golang vs. C#: performance characteristics (simple case study)

Advent of Code - Day 7, 2024: Backtracking and Eval