Sometimes the worst-case-scenario just won’t happen

Given a board with scramble, random letters of dimension H x W, and a word S, write an algorithm to check whether S can be found in the board or not. Now, S can be anywhere in the board as long as:

-          The letters are adjacent to each other

-          The same position (h,w) in the board cannot be used more than once to make up S

 

My first thought when faced with this problem was the following: a brute-force DFS will be exponential on S in the worst case: first you find any position that matches the first letter of the word, then you do a DFS, hence the time complexity would then O(H*W*(8^|S|)). Now let’s put some numbers to see how horrible this big-O becomes, assuming H = W = 50 and |S| = 20, then we’ll have 50*50*(2^(3*20)), or ~2^70, which is arguably a big number. Then I immediately ruled out the DFS and went with the non-recursive BFS, which would become a much more doable O(H*W*|S|), or using the numbers above, a mere 50K iterations. The strategy was similar to the DFS explained above, but using a BFS instead. Code was done, tested, things were looking good, problem solved. Except that… it was wrong. The following example destroyed my assumptions: the board is this:

 

s x k b m m x l k l j r h n p

o z a l y w p r f g d a q d g

r c b z a k d t d g a k p h b

r a p e o u x h w p i g f v i

l b e d z g p i m e f f x e x

l z k d s j k l o m o v e j s

e y y v o s c m i u g l c z j

o a c y d r i e d y b v s r k

l k z u a g h c e f j n b i e

y h m f x o s g f m i v n w t

x j v y i v m i y c o l t t u

u u t f a t j n t v t e t k s

k t f p i z s c z j m c r a m

q z m l o n c i w a n v b d p

q s x g r n s d l u c s t w n

 

And the word is “marcelomedeirosdebarros”. The BFS code failed to find the word, which is here:

 

* x k b m m x l k l j r h n p

* z a l y w p r f g d a q d g

* c * z a k d t d g a k p h b

* * p * o u x h w p i g f v i

l b e * z g p i m e f f x e x

l z k d * j k l o m o v e j s

e y y v * s c m i u g l c z j

o a c y d * * * * y b v s r k

l k z u a g h c * f j n b i e

y h m f x o s g f * i v n w t

x j v y i v m i y c * * t t u

u u t f a t j n t v t * t k s

k t f p i z s c z j m * * * *

q z m l o n c i w a n v b d p

q s x g r n s d l u c s t w n

 

Debugging the code I realized that unfortunately the BFS assumes that once the direction for the word is defined (because of initial matches), it does not have the capability to backtrack in the search, hence it is doomed. So the only option, unfortunately, is the DFS. So I implemented it, but without too much hope. Code’s done. Worked for the case above. Then I tried the 50x50 matrix with a long word:

 

d a g u z w d u e l u p t i l a a y d p p e p x w r u j m p j s p r g n u q y z e f f h d n n o y a

n y b n i z l h x h b u h p b b z c c y f k a d k k z i t i e f k l c l r u p g b n s q x k y q f c

o l m z u v x x k s z c n o d w j v n p k e m v w s r k g a l j c r u s s k r z o t k w h z z e m b

a c d q s o m r d c l n r k g q z w t l d y e n b q u p n x t m x e z j g m y k t b h o n a c z q q

z c k b y b p p p b o j d g i y q m k x k n x i r c r p a a n u z x x r b v g x l q x x w k x b g o

k q p s f m e e e k k i o o o u j g i r x v f m m t p a m h f q j l q x f g r k m z l k n c b i i b

a q s o r p l c g o x d p c q t y d y r o d p y e x j u y j r g r k h o y u u a l c a f n i k j t u

e y k r n x b t k u t q d x u q l e u c u e j x f m d j l v m a b e f t w a x k n f j z v v y c l w

x g o r z z p n m a e r i p h f p z s b u l g k p p u j c j n a y o y g m j e f d j n x q v w u s q

g z k j c l z d c k f v k c m t s u q b s f u m x l x j u t t z r l c o p p m n g i h u z x y l k w

d c e q x a e x m r n q a h u o o y z y e d h d u j y z n a m m y y q q v a o b j r c u z q z y d t

i z l m s z u a w n c e k l e b y l s a l o k x o m a n e d k z w u r n b m f q v u e a z e h r q y

z o f c h u o n x f z h w s d c i e c q z e f c e r j k y j f j w f e v u u m x y o o a r o x l x w

w r x z y u j d x q k e o n a p l q n h i o d w r l q e k q t x n b n n m w g j m r w w m u z a a b

d g x p o i o a h n e e s h w m r m a a l o k e v q r h j x m p e i m n a x n f z l o t f h z o z z

c m a n y p w s t x q h q k a p a j u j a t z v m r z q z o r d h k u a o q k x q h e p o s v e s p

h p t p n j m e b h p g t e g n z g p a a b r u t f o k g u w v x j h b j s w l n j p d i k n z j b

t o o s p x q x w i u c y h l u j k x e d h j e o q t v o i r o b f a h g f u j k m v n q l l i s o

b d c e x l e k e y d v i n r f a s g e n e q o t y p e n b e l d a h n m w i p c v a v f p d i w l

k t b o a w n o l f a i x d t s r i f d t d k r b s e e i d o c z b q l j p z k z y l c o t x w b m

y v d s s g i k m h s s f h r x j v m i i b i u e u s k o t u h t v e t z g j k c y d r s g s d k q

j b k s e w m i o x o a h d n l z j c y y c m c f j y q c e l p x m z s w a r h g q r l v c r a e g

h j a m s g p i n u y v g r v z d v e u p y s u r h l a g d c x m w p s o k t s d n n x i l k a q w

y v v k q j r f u k v p c j b b j k c d k s b l q c l u m f x y k j e x b b f g b i q j w n k d w h

c a y n m g t j t o y w t o n k f t e d z h e v b b c s p d r c y e d x m w x y d r h r i f p b w l

i r y s i u w c l k e b u q x q x p l q y i v g z h z m n w m i w d u z i g v i a j l m u c w r u r

h z q g n z v s s b v j o o y d f r v x h r t c p o w j l x k b s w p m k p p u d p o u n r v w n r

w f v w j q g n q y c m h u r m r s z m s p b b t s d z o x t a v t a f n v m y g x y r r r i t c b

z c w b z h d u d r m g y t e c q n t m z w g a g u q k j c k j g n x r d t u t b s r v m z y f m p

a u z q c d w v v e q s l a h c i l v n z w t a k g e l o g w k x i d q k g l u u h y p c j h r o j

r m k z q d a y g v z l h p i f u r y q s c a b t r y e x a y n w f i j a y d y p l a p v j a j g w

o q s x r o m j f y f v b f w l o b z y f w n a a g c k f j q o x a m r v z n a x r s e s u k u t j

u y j i r s c r w k m b d w i q y a t n q o o z b o d i i g w i r w p r e p j t g e d d o w n w w c

w j a f t f k k g d o t j p o e j h g r l i q z h m l l k f b g m f k m k i c j g u i g f m u n o k

m f p e r n s b x h u z a b c i g d f b x w e x k q z k u v r y w a s v q l n x t f p v p h u h e w

f r i m n p a s u a n i g b p w d z i g r y p i i r h j i d v s g x p a p j s d u y n n j l y n f s

e u z x f g b a e v i r k q f v b j l p g t m f m t l c y x s t d p z a t j s i z p f r v c o w h n

j p c q m k i u k m v y b g d o n u r e c s r c q r o v j j f o u l o d r n d o e z y g n a n y n g

p t f d k q y z z u w n k b g h d q s a x z q d l d u l p p d p h d u l j q a x f f n j s e o w z g

w x f n i p c b s h w m k j q g g g h u x m p v v f m h l h o x l i s f k t v e j i y t b s h e s l

y i p e i k i k s c d u q k s y b b q k w m l c l q n v t d y h p z i g h h z i x a l s b o q n a d

r f n h o l x n s s w m q o e v e r b m q t c i n w l s b x z z v q m s u i k x i u o h m l j m o f

j z l o m x d e t s z f m v s w r z c l a w g e d w e v v b m z d v n a o g e o p p o s y d i t o p

u j j d f n s h k q s i j s f a c q h s i k v o w k h o g a n l g f h d p z q a h g v n w g g w m x

b s l v h j n v h u f e b k c g b d k c k c a g k b p h r p e l b i z v z k y x p u s w v c e k k p

f d l k m z r f l f u f d r q u d t l h h r e n m q j e l h x e x c k a z q v z d u z s d j q a f b

z l z d i n z y y w p n t d i p i b h q u m x l d m r l j c a i y k i l f t t w b b s z j q a v k n

b o t p g m n n z v w n p v v h p j v x k o h l o s u y b n l v b v b m p i d v x r u y z n m w d c

b r a n d m g h a n r p l l j a l t u j r g f z z t u j l b g y v z m u v o s p f h h p d i f i o z

c s b t k u m a n j s x w z h t u m z p f w a f e k y p w r n l u l y n x b o r i u b i x i k i i k

 

And the word: “lifeislikeridingabicycletokeepyourbalanceyouhavetokeepmoving”. It turns out, that it worked very fast, and found the word:

 

d a * u z w d u e l u p t i l a a y d p p e p x w r u j m p j s p r g n u q y z e f f h d n n o y a

n y b * * z l h x h b u h p b b z c c y f k a d k k z i t i e f k l c l r u p g b n s q x k y q f c

o l m z u * x x k s z c n o d w j v n p k e m v w s r k g a l j c r u s s k r z o t k w h z z e m b

a c d q s * * r d c l n r k g q z w t l d y e n b q u p n x t m x e z j g m y k t b h o n a c z q q

z c k b y b * p p b o j d g i y q m k x k n x i r c r p a a n u z x x r b v g x l q x x w k x b g o

k q p s f m e * * * k i o o o u j g i r x v f m m t p a m h f q j l q x f g r k m z l k n c b i i b

a q s o r p l c g * x d p c q t y d y r o d p y e x j u y j r g r k h o y u u a l c a f n i k j t u

e y k r n x b t k u * q d x u q l e u c u e j x f m d j l v m a b e f t w a x k n f j z v v y c l w

x g o r z z p n m a * r i p h f p z s b u l g k p p u j c j n a y o y g m j e f d j n x q v w u s q

g z k j c l z d c k f * k c m t s u q b s f u m x l x j u t t z r l c o p p m n g i h u z x y l k w

d c e q x a e x m r n q * * * * o y z y e d h d u j y z n a m m y y q q v a o b j r c u z q z y d t

i z l m s z u a w n c e k l e b * l s a l o k x o m a n e d k z w u r n b m f q v u e a z e h r q y

z o f c h u o n x f z h w s d c i * * q z e f c e r j k y j f j w f e v u u m x y o o a r o x l x w

w r x z y u j d x q k e o n a p l q * h i o d w r l q e k q t x n b n n m w g j m r w w m u z a a b

d g x p o i o a h n e e s h w m r m a * * o k e v q r h j x m p e i m n a x n f z l o t f h z o z z

c m a n y p w s t x q h q k a p a j u j * t z v m r z q z o r d h k u a o q k x q h e p o s v e s p

h p t p n j m e b h p g t e g n z g p a a * * * t f o k g u w v x j h b j s w l n j p d i k n z j b

t o o s p x q x w i u c y h l u j k x e d h j e * q t v o i r o b f a h g f u j k m v n q l l i s o

b d c e x l e k e y d v i n r f a s g e n e q o t * * * n b e l d a h n m w i p c v a v f p d i w l

k t b o a w n o l f a i x d t s r i f d t d k r b s * e i d o c z b q l j p z k z y l c o t x w b m

y v d s s g i k m h s s f h r x j v m i i b i u e u s * * * u h t v e t z g j k c y d r s g s d k q

j b k s e w m i o x o a h d n l z j c y y c m c f j y q c * * p x m z s w a r h g q r l v c r a e g

h j a m s g p i n u y v g r v z d v e u p y s u r h l a g d * x m w p s o k t s d n n x i l k a q w

y v v k q j r f u k v p c j b b j k c d k s b l q c l u m f x * k j e x b b f g b i q j w n k d w h

c a y n m g t j t o y w t o n k f t e d z h e v b b c s p d r * y e d x m w x y d r h r i f p b w l

i r y s i u w c l k e b u q x q x p l q y i v g z h z m n w m * w d u z i g v i a j l m u c w r u r

h z q g n z v s s b v j o o y d f r v x h r t c p o w j l x k * s w p m k p p u d p o u n r v w n r

w f v w j q g n q y c m h u r m r s z m s p b b t s d z o x t * v t a f n v m y g x y r r r i t c b

z c w b z h d u d r m g y t e c q n t m z w g a g u q k j c k j * * x r d t u t b s r v m z y f m p

a u z q c d w v v e q s l a h c i l v n z w t a k g e l o g w k x * * q k g l u u h y p c j h r o j

r m k z q d a y g v z l h p i f u r y q s c a b t r y e x a y n w f * j a y d y p l a p v j a j g w

o q s x r o m j f y f v b f w l o b z y f w n a a g c k f j q o x a m * v z n a x r s e s u k u t j

u y j i r s c r w k m b d w i q y a t n q o o z b o d i i g w i r w p r * p j t g e d d o w n w w c

w j a f t f k k g d o t j p o e j h g r l i q z h m l l k f b g m f k m * * c j g u i g f m u n o k

m f p e r n s b x h u z a b c i g d f b x w e x k q z k u v r y w a s v q * n x t f p v p h u h e w

f r i m n p a s u a n i g b p w d z i g r y p i i r h j i d v s g x p a p j * d u y n n j l y n f s

e u z x f g b a e v i r k q f v b j l p g t m f m t l c y x s t d p z a t j s * z p f r v c o w h n

j p c q m k i u k m v y b g d o n u r e c s r c q r o v j j f o u l o d r n d o * z y g n a n y n g

p t f d k q y z z u w n k b g h d q s a x z q d l d u l p p d p h d u l j q a x * f n j s e o w z g

w x f n i p c b s h w m k j q g g g h u x m p v v f m h l h o x l i s f k t v e j * y t b s h e s l

y i p e i k i k s c d u q k s y b b q k w m l c l q n v t d y h p z i g h h z i x a * s b o q n a d

r f n h o l x n s s w m q o e v e r b m q t c i n w l s b x z z v q m s u i k x i u o h m l j m o f

j z l o m x d e t s z f m v s w r z c l a w g e d w e v v b m z d v n a o g e o p p o s y d i t o p

u j j d f n s h k q s i j s f a c q h s i k v o w k h o g a n l g f h d p z q a h g v n w g g w m x

b s l v h j n v h u f e b k c g b d k c k c a g k b p h r p e l b i z v z k y x p u s w v c e k k p

f d l k m z r f l f u f d r q u d t l h h r e n m q j e l h x e x c k a z q v z d u z s d j q a f b

z l z d i n z y y w p n t d i p i b h q u m x l d m r l j c a i y k i l f t t w b b s z j q a v k n

b o t p g m n n z v w n p v v h p j v x k o h l o s u y b n l v b v b m p i d v x r u y z n m w d c

b r a n d m g h a n r p l l j a l t u j r g f z z t u j l b g y v z m u v o s p f h h p d i f i o z

c s b t k u m a n j s x w z h t u m z p f w a f e k y p w r n l u l y n x b o r i u b i x i k i i k

 

Question is, why? Why didn’t it take 1 trillion years to finish running? The answer is that in the case of this problem, the worst case scenario simply won’t ever be real, for all practical purposes. This is because in the course of the DFS the search space gets pruned aggressively as we run into a letter that does not have a consecutive, adjacent match in the word. Which is very, very likely to happen. The worst-case scenario will only happen when we have a lot of partial matches of the word in the board. But the probability that such case happens is very small: for a word of size S, the probability that we find a partial match of the first S-1 letters is the following: (1/(26^(S-1)). For S = 20, the probability would then be 10^(-25)%. Which is a ridiculously small chance, almost zero. Hence the worst-case scenario just doesn’t make sense in the context of the problem. As a matter of fact, because of the nature of the problem (the randomness of the board and the word), we can even claim that O(DFS) == O(BFS). No wonder the DFS solution worked as fast as the BFS, with one extra bonus: it actually works for all cases whereas the BFS is flawed. Code below.

WordsBoard.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;
using System.IO;
namespace WordsFinder
{
    class WordsBoard
    {
        private char[,] board = null;
        private int height = 0;
        private int width = 0;
        private WordsBoard() { }
        public WordsBoard(int height, int width)
        {
            if (height <= 0 ||
                width <= 0)
            {
                throw new Exception("Invalid height and width");
            }
            this.height = height;
            this.width = width;
            this.board = new char[this.height, this.width];
        }
        public void ReadInput(string fileName)
        {
            FileInfo fi = new FileInfo(fileName);
            StreamReader sr = fi.OpenText();
            int h = 0;
            while (!sr.EndOfStream)
            {
                string[] parts = sr.ReadLine().Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
                for (int w = 0; w < parts.Length; w++)
                {
                    this.board[h, w] = Convert.ToChar(parts[w]);
                }
                h++;
            }
            sr.Close();
        }
        public bool AddWord(string word)
        {
            if (word == null || word.Length == 0)
            {
                return false;
            }
            Console.Write("Attempting to place word '{0}'...", word);
            int NUMBER_ATTEMPTS = 1000000;
            bool placedWord = false;
            for (int iAttempts = 0; iAttempts < NUMBER_ATTEMPTS; iAttempts++)
            {
                LinkedList<int> hCoordinates = new LinkedList<int>();
                LinkedList<int> wCoordinates = new LinkedList<int>();
                Hashtable htCoordinateUsed = new Hashtable();
                Random rdH = new Random();
                Random rdW = new Random(rdH.Next());
                int hCurrent = rdH.Next(0, this.height);
                int wCurrent = rdW.Next(0, this.width);
                hCoordinates.AddLast(hCurrent);
                wCoordinates.AddLast(wCurrent);
                int key = hCurrent * this.width + wCurrent;
                htCoordinateUsed.Add(key, true);
                int wordIndex = 1;
                while (wordIndex < word.Length)
                {
                    bool placedLetter = false;
                    for (int m = 0; m < 80; m++)
                    {
                        int hOffset = rdH.Next(0, 2) - 1;
                        int wOffset = rdW.Next(0, 2) - 1;
                        key = (hCurrent + hOffset) * this.width + wCurrent + wOffset;
                        if (hCurrent + hOffset >= 0 &&
                            hCurrent + hOffset < this.height &&
                            wCurrent + wOffset >= 0 &&
                            wCurrent + wOffset < this.width &&
                            !htCoordinateUsed.ContainsKey(key))
                        {
                            hCurrent += hOffset;
                            wCurrent += wOffset;
                            hCoordinates.AddLast(hCurrent);
                            wCoordinates.AddLast(wCurrent);
                            htCoordinateUsed.Add(key, true);
                            wordIndex++;
                            placedLetter = true;
                            break;
                        }
                    }
                    if (!placedLetter)
                    {
                        break;
                    }
                }
                if (wordIndex >= word.Length)
                {
                    Console.WriteLine("succeeded!");
                    for (int i = 0; i < hCoordinates.Count; i++)
                    {
                        hCurrent = (int)hCoordinates.ElementAt<int>(i);
                        wCurrent = (int)wCoordinates.ElementAt<int>(i);
                        board[hCurrent, wCurrent] = word[i];
                    }
                    placedWord = true;
                    this.Print();
                    break;
                }
            }
            if (!placedWord)
            {
                Console.WriteLine("failed!");
            }
            return placedWord;
        }
        public void BuildRandomBoard()
        {
            Random rd = new Random();
            for (int h = 0; h < this.height; h++)
            {
                for (int w = 0; w < this.width; w++)
                {
                    this.board[h, w] = Convert.ToChar('a' + rd.Next(0, 26));
                }
            }
        }
        public void Print()
        {
            for (int h = 0; h < this.height; h++)
            {
                for (int w = 0; w < this.width; w++)
                {
                    Console.Write("{0} ", this.board[h,w]);
                }
                Console.WriteLine();
            }
            Console.WriteLine();
        }
        public bool FindWord(string word)
        {
            if(word == null || word.Length == 0)
            {
                Console.WriteLine("Invalid word!");
                return false;
            }
            bool foundWord = false;
            for (int h = 0; h < this.height; h++)
            {
                for (int w = 0; w < this.width; w++)
                {
                    if (word[0] == this.board[h, w])
                    {
                        LinkedList<int> hLstPosition = new LinkedList<int>();
                        LinkedList<int> wLstPosition = new LinkedList<int>();
                        Hashtable htUsed = new Hashtable();
                        if (this._SearchWordDFS(h, w, word, 0, ref hLstPosition, ref wLstPosition, ref htUsed))
                        {
                            for (int i = 0; i < hLstPosition.Count; i++)
                            {
                                int hCurrent = (int)hLstPosition.ElementAt<int>(i);
                                int wCurrent = (int)wLstPosition.ElementAt<int>(i);
                                this.board[hCurrent, wCurrent] = '*';
                            }
                            foundWord = true;
                            break;
                        }
                    }
                }
                if (foundWord)
                {
                    break;
                }
            }
            if (foundWord)
            {
                Console.WriteLine("Found word '{0}':", word);
            }
            else
            {
                Console.WriteLine("Failed to find word '{0}':", word);
            }
            this.Print();
            return foundWord;
        }
        private bool _SearchWordDFS(int hCurrent,
                                    int wCurrent,
                                    string word,
                                    int currentWordIndex,
                                    ref LinkedList<int> hLstPosition,
                                    ref LinkedList<int> wLstPosition,
                                    ref Hashtable htUsed)
        {
            if (word == null || word.Length == 0)
            {
                return false;
            }
            if (currentWordIndex >= word.Length)
            {
                return true;
            }
            for (int hOffset = -1; hOffset <= 1; hOffset++)
            {
                for (int wOffset = -1; wOffset <= 1; wOffset++)
                {
                    int hPos = hCurrent + hOffset;
                    int wPos = wCurrent + wOffset;
                    int key = hPos * this.width + wPos;
                    if (hPos >= 0 &&
                        hPos < this.height &&
                        wPos >= 0 &&
                        wPos < this.width &&
                        !htUsed.ContainsKey(key) &&
                        this.board[hPos, wPos] == word[currentWordIndex])
                    {
                        hLstPosition.AddLast(hPos);
                        wLstPosition.AddLast(wPos);
                        htUsed.Add(key, true);
                        if (_SearchWordDFS(hPos,
                                           wPos,
                                           word,
                                           currentWordIndex + 1,
                                           ref hLstPosition,
                                           ref wLstPosition,
                                           ref htUsed))
                        {
                            return true;
                        }
                        hLstPosition.RemoveLast();
                        wLstPosition.RemoveLast();
                        htUsed.Remove(key);
                    }
                }
            }
            return false;
        }
    }
}
Program.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace WordsFinder
{
    class Program
    {
        static void Main(string[] args)
        {
            int height = 0;
            int width = 0;
            if ((args.Length != 3 && args.Length != 4) ||
                !Int32.TryParse(args[0], out height) ||
                !Int32.TryParse(args[1], out width))
            {
                Console.WriteLine("WordsFinder.exe <height> <width> <word> [input file with the board]");
                return;
            }
            string word = args[2];
            WordsBoard wordsBoard = new WordsBoard(height, width);
            if (args.Length == 4)
            {
                wordsBoard.ReadInput(args[3]);
            }
            else
            {
                wordsBoard.BuildRandomBoard();
                wordsBoard.AddWord(word);
            }
            wordsBoard.FindWord(word);
        }
    }
}

Comments

Popular posts from this blog

Advent of Code - Day 6, 2024: BFS and FSM

Golang vs. C#: performance characteristics (simple case study)

Advent of Code - Day 7, 2024: Backtracking and Eval