Longest Repeating Substring: Sliding Window + Induction

It took me a long time to figure this one out... partially because the hints were a little misleading IMO.

Here it is the problem: https://leetcode.com/problems/longest-repeating-substring/

1062. Longest Repeating Substring
Medium

Given a string S, find out the length of the longest repeating substring(s). Return 0 if no repeating substring exists.

 

Example 1:

Input: "abcd"
Output: 0
Explanation: There is no repeating substring.

Example 2:

Input: "abbaba"
Output: 2
Explanation: The longest repeating substrings are "ab" and "ba", each of which occurs twice.

Example 3:

Input: "aabcaabdaab"
Output: 3
Explanation: The longest repeating substring is "aab", which occurs 3 times.

Example 4:

Input: "aaaaa"
Output: 4
Explanation: The longest repeating substring is "aaaa", which occurs twice.

 

Constraints:

  • The string S consists of only lowercase English letters from 'a' - 'z'.
  • 1 <= S.length <= 1500

The first constraint does not really matter. The second one does. It tells me that something slightly less than O(n^2) might be acceptable. The hints just tell you to go straight to O(n^2): find all the substrings, put in a hash, return the longest that repeats. I tried that approach so many times, as you can see below:

After reading some of the comments, finally the winning strategy relied on two insights:

1) Sliding Window: suppose that you know the length of the repeating string, say N. Can you find a linear-time algorithm to find that repeating string? The answer is yes, using sliding window approach. You would have a sliding window of length N, keeping the result in a hash (notice that the space complexity is O(n), and the moment that you find the first repeating one, return). That's O(n)-time, but if there is a repeat string of length N, you'll find it faster than O(n) (average). Keep this function handy.

2) (Induction) If a string of length N repeats, then a string of length N-1 must also repeat. This seems too intuitive and too naïve to help out with this problem, but it does, and a lot. Basically you can have a function that keeps calling the sliding window (1) above with N = 1, 2, 3, ..., and you so on. The moment that the call returns 0 for an input N, then the result that you want is the result for N-1. Because if you have not found a repeating string of length N, you won't find a return string of N+1 due to the induction rule. Hence you can finish this step prematurely. This step is in the worst case O(n)-time too, but on average it will be finished prematurely hence the execution time will be less than O(n)*O(n), which suffices to, at last, pass:

Code below, and btw, be sure to check out our newly release Microsoft Esports HUB: https://aka.ms/esportshub

Code is below, cheers, ACC.

public int LongestRepeatingSubstring(string S)
{
    if (String.IsNullOrEmpty(S)) return 0;

    int previous = 0;

    for (int i = 1; i < S.Length; i++)
    {
        int current = LongestRepeatingSubstringSlidingWindow(S, i);
        if (current == 0) break;
        previous = current;
    }

    return previous;
}

private int LongestRepeatingSubstringSlidingWindow(string str, int rightMark)
{
    Hashtable cache = new Hashtable();

    string window = "";
    for (int i = 0; i < str.Length; i++)
    {
        if (i < rightMark)
        {
            window += str[i].ToString();
        }
        else
        {
            if (cache.ContainsKey(window)) return window.Length;
            cache.Add(window, true);
            window = window.Remove(0, 1);
            window += str[i].ToString();
        }
    }
    if (cache.ContainsKey(window)) return window.Length;

    return 0;
}

Comments

Popular posts from this blog

Advent of Code - Day 6, 2024: BFS and FSM

Advent of Code - Day 7, 2024: Backtracking and Eval

Golang vs. C#: performance characteristics (simple case study)