The Ultimate Tree
The Ultimate Tree is the fastest Tree data structure ever built.
As you can see, all sophisticated trees above have
retrieval, insertion and deletion in O(Log(n)). The question then becomes: is
it possible to come up with The Ultimate
Tree, a Tree capable of search, insertion and deletion in constant time
(O(1))? The claim in this post is that yes, there IS a way to do that. Here is
how it will work:
1)
First, no free lunch here! Such a massive speed
up in search, insertion and deletion will require extra space, hence we’ll not
be optimizing for space here, and will tolerate up to O(n)-space. Space is
cheap. Most of the time.
2)
Second, the key idea behind it is to manipulate
the tree via a second data structure, a data structure that I’ll call the Tree Manager (which we’ll refer to from
now on as TM). A TM is a quick
lookup-table to the elements of the actual Tree. It contains a hash table where
the key is a unique identifier to each node of the tree, and the value is a
pointer to the actual node in the tree. With no lack of generalization, we’ll
assume that each element of the tree is unique. Any of the operations search,
insert and delete will be done thru the TM.
3)
Still within the context of the TM, the search
and insertion are straightforward operations: a search simply returns the
cached value in the Hash table. The insertion takes as input the parent value and
the child value and inserts it there. Each node in the tree represents its
children as yet another Hash table for quick look-up. The drawback is that we
lose the concept of order for any node’s children, but that’s OK (really, if
you have a node with many children, do you really care about the order of these
children?).
4)
The deletion is the tricky part. See, all the
trees shown above deal with deletion in very complex ways, with tricky
rotations, translations, special cases, and so on. In order to achieve O(1),
what we decided to do here is not really delete the node – but flag it as gone.
Node is still there, but invisible.
The implications of this approach are the following:
a.
Memory is wasted. The node is still hanging in
the tree, but invisible. Again, we’re not that much concerned about space here.
b.
This approach will put a significant pressure on
the internal Tree methods in order to take into account the invisible nodes. Even the most basic
method, like calculating the height of a node (height being defined as the max
distance between the node and any of its descending leaves) will have an extra
layer of complexity to deal with the invisible nodes – code complexity, yes,
but keeping the overall time complexity the same. Another illustration of this
complexity is the Print-Nodes-By-Level
method. This is usually accomplished with a simple Depth-First Search (DFS), no
recursions involved, but in order to deal with the invisible nodes, the DFS becomes significant more convoluted: still
a DFS but now using a Priority Queue and a recursive call to populate the
Priority Queue. Getting closer to a Dijkstra's Algorithm.
c.
Finally, if we want to eventually de-allocate
the memory held by the invisible
nodes, a process will have to remove these nodes and reconstruct the tree later
on. This can be done sporadically, similar to a garbage collection algorithm.
Depending on the application in which the tree will be used, I consider this
step as optional.
The picture below illustrates how the TM would interact with
our Tree class:
The new time complexity then becomes:
Hey, guess what, AnotherCasualCoder
is now live on Github!!! Code is there, check it out:
Happy Mother's Day!!!
Marcelo
I really enjoyed reading the blog. How you have applied a variety of data structures to optimize the time complexity yet retain the property why a tree is used for. Awesome application of simple concepts to build something useful and applicable in real world day to day problems we solve at work. Thanks for this. I wanted to just point one one thing about the worst case time complexity as the above solution uses Hashtables at two levels and hashtable itself has worst time complexity of O(n) for search, insert and delete. That might eventually mean the worst case time complexity of the above data structure as well becomes O(n) instead of O(1), but yes that would only be a case when the struture is too large and needs a rehash or for whatever reason it encounters collisions and uses daisy chaining or open addressing to resolve it. Again thanks for the article as it helped me think how I can apply several data structures to come up with out of the box solutions that otherwise does not exist!
ReplyDeleteAh, good point we often think about hash tables as O(1)-time, but we also often forget about the possibility of collisions which do create a problem. However given the constraints especially the uniqueness of the keys, this is likely to not be an issue. The caveat with my approach is that the caller is responsible for keeping the tree balanced (if she wishes so) whereas the other methods do so already.
ReplyDelete