Reducing a quadratic problem to linear using Dynamic Programming

I've first seen this problem posted by HackerRank, which I thought was super interesting:

You're given a string with characters from the alphabet {0,1} (only zeroes and ones allowed), such as "0010". The task is to count the number of sub-strings S that satisfy the following pattern:


  1. Length of S must be greater than 0 and even
  2. Assume S = S1 + S2 where S1 (and S2) length is equal to half of S length:
    1. Either S1 contains only zeros and S2 contains only ones
    2. Or S1 contains only ones and S2 contains only zeroes
An example makes thinks easier: suppose the input string is "00110". We see the following strings matching S's pattern:

  1. S1 = "00110"
  2. S2 = "00110"
  3. S3 = "00110"
OK, so the return value here should be 3.

Quadratic Solution:

An initial quadratic (O(n^2)-time, O(1)-space) can be accomplished in a relatively simple manner:
  • Traverse the string from iterator i = 1..n-1
  • For each i:
    • Check whether i-1 and i are the middle elements of an S string. While this is true, keep navigating left and right (2-pointers approach)
This approach is N^2-time in the worst case. Here is the code:


static int QuadraticSolution(string str)
{
    int count = 0;

    for (int i = 1; i < str.Length; i++)
    {
        if (str[i] != str[i - 1])
        {
            int leftPointer = i - 1;
            int rightPointer = i;
            while (leftPointer >= 0 &&
                str[leftPointer] == str[i - 1] &&
                rightPointer < str.Length &&
                str[rightPointer] == str[i])
            {
                count++;
                leftPointer--;
                rightPointer++;
            }
        }
    }

    return count;
}

Dynamic Programming Solution:

I think I still struggle to reach a dynamic programming solution naturally, but there are ways to train your brain to zero-in into an DP mindset approach. Here are some fundamentals that you can use to force your brain into DP territory:
  • DP is a trade off space/time. You will no longer be in O(1)-space territory. You'll likely be jumping into either O(n)-space or O(large constant)-space. Keep this in mind: you will use extra space (memory).
  • Most of the time don't think about generating all the solutions: DP is usually applied when you want to generate the count of solutions.
  • I don't think it is a good idea to think about DP as a recursive problem at all.
  • Instead, think about the following: suppose that you're at index "i" right now. Suppose that you already solved the problem for all the other indexes 0..i-1. What's the information that you need to store for all these previous indexes 0..i-1 in order to solve the problem for index "i"? This is the BEST way IMHO to think about DP
OK, back to the problem in question. When at index "i", we know then that we have already solved the problem for all 0..i-1. We'll look at position i-1. Two cases:
  1. input[i-1] != input[i]. That's easy. We found a new string that belongs to S.
  2. input[i-1] == input[i]. Well.... what we need to know here is the following: how many of the same character input[i-1] are there in a row? Let's assume that we have this answer, and that the answer is 3, meaning that behind index i there is a sequence of 3 consecutive characters that are the same as input[i]. Which means we now know that there are four consecutive (same element) chars ending at index i. What we can do then is take a look at the solution for the position i-4, because we know that if that position exists, it must be a different character than input[i]. Then we look at how many consecutive elements are there up to position i-4. If that number if greater than or equal to 4, then.... we found another string in S, ending at position i!
So what's the information we need to store at each position i? The number of consecutive, same character, ending at position i
The solution then becomes an O(n)-time, one traverse, O(n)-space, because now we're storing an array of integers containing the number of consecutive, same characters ending at position i.
Here is the code:

static int DynamicProgrammingSolution(string str)
{
    if (String.IsNullOrEmpty(str)) return 0;

    int[] dpCountOfSameSymbolUpTo = new int[str.Length];
    int count = 0;

    dpCountOfSameSymbolUpTo[0] = 1; //First symbol is equal to itself
    for (int i = 1; i < str.Length; i++)
    {
        if (str[i] != str[i - 1])
        {
            count++;
            dpCountOfSameSymbolUpTo[i] = 1;
        }
        else
        {
            if (i - 1 - dpCountOfSameSymbolUpTo[i - 1] >= 0 &&
            dpCountOfSameSymbolUpTo[i - 1 - dpCountOfSameSymbolUpTo[i - 1]] >= dpCountOfSameSymbolUpTo[i - 1] + 1)
            {
                count++;
            }
            dpCountOfSameSymbolUpTo[i] = dpCountOfSameSymbolUpTo[i - 1] + 1;
        }
    }

    return count;
}

Performance Analysis

To evaluate the performance of both algorithms, I created a method that returns a random (valid) string up to a certain length:

static string GenerateRandomSequence(int len){    Random rd = new Random();    string str = "";    for (int i = 0; i < len; i++)    {        str += (rd.Next(0, 2) == 0) ? "0" : "1";    }    return str;}

The input size I used for testing was 300,000 characters long. Here is the execution time:

Quadratic Solution: 199969 (in 0.0115309 secs)
Linear (DP) Solution: 199969 (in 0.0059835 secs)

The DP solution was then around 2x faster than the quadratic solution. Now twice as fast can mean life or death when building a true feature for millions of users to use. Something to keep in mind. Cheers, Marcelo.

Comments

  1. Very nice Marcelo! I have to admit that back in the days I solved this problem using naive quadratic algorithm, because it was enough and I wasn't challenged to solve it more efficiently. But your post title inspired me to give this problem another thought and I immediately thought about following - for every character at position i all we need to know in order to decide whether it can be an end of the sequence which satisfies problem requirements is number of occurrences of this element in a row (current_count) so far and count of another element in a row which immediately a sequence of current element (prev_count). In order for the 2) to hold, current_count must be >= prev_count. To make this more concrete, when we scan "0", we have prev_count = 0 and current_count = 1. If we now scan "1", current_count becomes 1 and prev_count becomes previous current_count, which was 1, so we found a substring matching a pattern.

    In other words, all state which we need to store is current_character, previous_count and current_count in order to solve this problem in O(n) time and O(1) space. This is actually somewhat common in DP solutions to find ways to reduce storage requirements by realizing that only a very limited number of previous state is required in order to move forward. A famous example of constant space DP solution is Fibonacci number calculation.

    My final solution is:

    int substringCount(const string &str) {
    if (str.empty()) return 0;
    int prev_count = 0;
    int total_count = 0;
    char current_char = str[0];
    int current_count = 0;
    for (char ch : str) {
    if (ch != current_char) {
    current_char = ch;
    prev_count = current_count;
    current_count = 0;
    }
    current_count += 1;
    total_count += current_count <= prev_count;
    }
    return total_count;
    }

    http://ideone.com/ksmMv7

    Thanks for keeping me on my toes :)

    ReplyDelete
  2. Supreme!!!! As always, brilliant insight!!!

    ReplyDelete
    Replies
    1. well, I think your insight is brilliant because it's very general - I just made a tiny next step to optimize it. I remember the days, when you were saying that you're not very good at DP... Based on your recent posts it seems like those days are long gone :) I expect that in no time, you'll solve TSP problem in polynomial time using DP :)

      Delete
  3. Thank you for sharing valuable informationNice post,I enjoyed reading this post.

    ซอมบี้

    ReplyDelete

Post a Comment

Popular posts from this blog

Advent of Code - Day 6, 2024: BFS and FSM

Advent of Code - Day 7, 2024: Backtracking and Eval

Golang vs. C#: performance characteristics (simple case study)