Repeated DNA Sequences, by LeetCode

Here is today's problem: https://leetcode.com/problems/repeated-dna-sequences/description/:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

My first solution just doing string manipulation timed out.
A better solution (not the best though) is to use extra memory and do the following:

  • Have two hashtables: one storing every dna of length 10, and one storing the solution
  • As you add the dnas to be first hashtable, check if it has been there before
  • If so, add to the solution
It is still a slow solution, O(10*Len(s)), some optimization can be done to get it down to O(Len(s)) (reduce the constant), the space used here is quite a bit though. A better solution (still O(Len(s)-time) would be to use a Trie Data Structure. Hope you enjoy, thanks!



public class Solution
{
 public IList FindRepeatedDnaSequences(string s)
 {
  Hashtable dnaMap = new Hashtable();
  Hashtable solution = new Hashtable();

  int n = 10;
  for (int i = 0; i <= s.Length - n; i++)
  {
   string dna = s.Substring(i, n);
   if (!dnaMap.ContainsKey(dna)) dnaMap.Add(dna, 0);
   dnaMap[dna] = (int)dnaMap[dna] + 1;

   if ((int)dnaMap[dna] > 1 && !solution.ContainsKey(dna))
   {
    solution.Add(dna, true);
   }
  }

  List retVal = new List();
  foreach (string dna in solution.Keys) retVal.Add(dna);

  return retVal;
 }
}

Comments

  1. I've also decided not to be fancy with Tries and stick to hashtable. The solution is a one-liner:

    from collections import Counter

    class Solution:
    def findRepeatedDnaSequences(self, s):
    """
    :type s: str
    :rtype: List[str]
    """
    return [sequence for sequence, count in Counter(s[i:i+10] for i in range(len(s) - 9)).items() if count > 1]

    ReplyDelete

Post a Comment

Popular posts from this blog

Golang vs. C#: performance characteristics (simple case study)

Claude vs ChatGPT: A Coder's Perspective on LLM Performance

ProjectEuler Problem 719 (some hints, but no spoilers)