A simple trie for a prefix match problem, by Daily Coding Problem

The daily coding problem for today was this one:

Good morning! Here's your coding interview problem for today.
This problem was asked by Twitter.
Implement an autocomplete system. That is, given a query string s and a set of all possible query strings, return all strings in the set that have s as a prefix.
For example, given the query string de and the set of strings [dogdeerdeal], return [deerdeal].
Hint: Try preprocessing the dictionary into a more efficient data structure to speed up queries.

The problem is begging for you to use a Trie data structure. I've written about tries many years ago (you can search for "Trie" in the anothercasualcoder blog), but I thought of an even simpler implementation.
Honestly you only need two member variables for your Trie class:

  • A boolean that indicates whether this node is a leaf node (meaning it is the end of a word)
  • A hashtable of Tries corresponding to its children
Honestly, that's all. I find it much easier to implement everything recursively for the trie. I also believe that using a hashtable is more efficient than for instance allocating an array of Tries for each letter - the allocation is static and you might not need all that space after all since your Trie might end up being very sparse.
The code is below. It may look complicated - it isn't. If you follow the logic that:

  • Either the current node has a match to the first letter (and it may or may not be a leaf), in which case you'll call recursively for the rest of the word,
  • Or, there is no match and you'll bail
It becomes relatively simple. Code is in the infamous GitHub: https://github.com/marcelodebarros/dailycodingproblem/blob/master/DailyCodingProblem09252018.cs

Any questions, ping me. Cheers, Marcelo

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections;

namespace DailyCodingProblem
{
class DailyCodingProblem09252018
{
private SimpleTrie simpleTrie = new SimpleTrie();
public void CreateTrie(string[] listWords)
{
foreach (string word in listWords) simpleTrie.AddWord(word);
}

public void PrintMatches(string prefix)
{
List<string> matches = simpleTrie.MatchPrefixes(prefix);
foreach (string match in matches)
{
Console.WriteLine(match);
}
}
}

class SimpleTrie
{
public bool isLeaf = false;
private Hashtable children = null;

public SimpleTrie() { }

public SimpleTrie(bool isLeaf)
{
this.isLeaf = isLeaf;
}

public void AddWord(string word)
{
if (String.IsNullOrEmpty(word)) return;
if (children == null) children = new Hashtable();
SimpleTrie child = null;
if (!children.ContainsKey(word[0]))
{
child = new SimpleTrie(word.Length == 1);
children.Add(word[0], child);
}
child = (SimpleTrie)children[word[0]];
child.AddWord(word.Substring(1));
}

public bool IsWordPresent(string word)
{
if (String.IsNullOrEmpty(word) || children == null || !children.ContainsKey(word[0])) return false;
if (word.Length == 1 && ((SimpleTrie)children[word[0]]).isLeaf) return true;
return ((SimpleTrie)children[word[0]]).IsWordPresent(word.Substring(1));
}

public List<string> MatchPrefixes(string prefix)
{
List<string> matches = new List<string>();
_MatchPrefixes(prefix, "", ref matches);
return matches;
}

private void _MatchPrefixes(string prefix, string currentWord, ref List<string> matches)
{
if (String.IsNullOrEmpty(prefix))
{
_AllWords(currentWord, ref matches);
return;
}
if (children == null || !children.ContainsKey(prefix[0])) return;
((SimpleTrie)children[prefix[0]])._MatchPrefixes(prefix.Substring(1), currentWord + prefix[0].ToString(), ref matches);
}

private void _AllWords(string currentWord, ref List<string> matches)
{
if (isLeaf) matches.Add(currentWord);
if (children != null)
foreach (char letter in children.Keys)
((SimpleTrie)children[letter])._AllWords(currentWord + letter.ToString(), ref matches);
}
}
}  

Comments

  1. Ideal problem to show the power of Tries :) BTW, the memory usage of Hashtable vs static array is a little tricky since hashtables are usually fairly large, but it's a safe choice since it would work for any alphabet whereas arrays would not make any sense for anything larger than ASCII.

    ReplyDelete

Post a Comment

Popular posts from this blog

Golang vs. C#: performance characteristics (simple case study)

Claude vs ChatGPT: A Coder's Perspective on LLM Performance

ProjectEuler Problem 719 (some hints, but no spoilers)