Using Finite-State Machine (FSM) to model and solve string problems

From time to time one ends up facing a string manipulation problem, such as "find a certain pattern" and "replace parts of the pattern" or "operate in the characters within the pattern". For example, let's suppose that the problem that we want to solve is the following:

Given a string, certain characters will be placed in between an adjacent sequence (with the same length) of the character '*'. For example: "this is a *** test ***". In this particular example, it just so happens that the string "test" is located in between the sequence "***". Anytime that you find a sub-string in between the '*'s, replace it with its uppercase form. Hence, in the given example, return "this is a *** TEST ***".

It is easy to write the code to solve such a problem, and variations of the problem. But I find that, whenever you model such a category of problems using Finite-State Machines (or FSM), it makes not only the interpretation of the problem and the solution more natural, but also the code ends up being very clean in addition to achieving a nice symmetry which improves readability.

Without thinking too much (at least initially) about optimization, there are clearly four states for the given problem: initial state, inside a start-mark, inside an end-mark, and in between marks. Once the states are defined, the state transitions can be also defined (in black below). Finally you should decide on the actions to be taken within each state (in green). By doing so the visualization of the solution becomes very natural. Finally, it is important to actually write down the FSM diagram as shown below - just by looking at it the developer can then determine further optimizations, for example, looking at the diagram below the two states indicating "marks" (start and end) are actually identical in terms of state-transitions and actions, hence they can (and should) be combined. Code is down below too - cheers, Marcelo.


Code, in C#:

        public static string ReplaceMarksWithUppercase(string str)
        {
            if (String.IsNullOrEmpty(str)) return str;

            /* States:
             * 0: inital state
             * 1: inside a start-mark
             * 2: in between marks
             * 3: inside an end-mark
             * */
            int state = 0;
            StringBuilder retVal = new StringBuilder(str);

            for (int i = 0; i < str.Length; i++)
            {
                switch (state)
                {
                    case 0:
                        if (str[i] != '*')
                        {
                            retVal[i] = str[i];
                        }
                        else
                        {
                            state = 1; //Inside a start-mark
                        }
                        break;
                    case 1:
                        if (str[i] != '*')
                        {
                            retVal[i] = Char.ToUpper(str[i]);
                            state = 2; //In between marks
                        }
                        break;
                    case 2:
                        if (str[i] != '*')
                        {
                            retVal[i] = Char.ToUpper(str[i]);
                        }
                        else
                        {
                            state = 3; //Inside an end-mark
                        }
                        break;
                    case 3:
                        if (str[i] != '*')
                        {
                            state = 0; //Back to the initial state
                        }
                        break;
                }
            }

            return retVal.ToString();
        }
 

Comments

Popular posts from this blog

Advent of Code - Day 6, 2024: BFS and FSM

Advent of Code - Day 7, 2024: Backtracking and Eval

Golang vs. C#: performance characteristics (simple case study)