Repeated DNA Sequences, by LeetCode
Here is today's problem: https://leetcode.com/problems/repeated-dna-sequences/description/:
My first solution just doing string manipulation timed out.
A better solution (not the best though) is to use extra memory and do the following:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC", "CCCCCAAAAA"]
My first solution just doing string manipulation timed out.
A better solution (not the best though) is to use extra memory and do the following:
- Have two hashtables: one storing every dna of length 10, and one storing the solution
- As you add the dnas to be first hashtable, check if it has been there before
- If so, add to the solution
It is still a slow solution, O(10*Len(s)), some optimization can be done to get it down to O(Len(s)) (reduce the constant), the space used here is quite a bit though. A better solution (still O(Len(s)-time) would be to use a Trie Data Structure. Hope you enjoy, thanks!
public class Solution { public IListFindRepeatedDnaSequences(string s) { Hashtable dnaMap = new Hashtable(); Hashtable solution = new Hashtable(); int n = 10; for (int i = 0; i <= s.Length - n; i++) { string dna = s.Substring(i, n); if (!dnaMap.ContainsKey(dna)) dnaMap.Add(dna, 0); dnaMap[dna] = (int)dnaMap[dna] + 1; if ((int)dnaMap[dna] > 1 && !solution.ContainsKey(dna)) { solution.Add(dna, true); } } List retVal = new List (); foreach (string dna in solution.Keys) retVal.Add(dna); return retVal; } }
I've also decided not to be fancy with Tries and stick to hashtable. The solution is a one-liner:
ReplyDeletefrom collections import Counter
class Solution:
def findRepeatedDnaSequences(self, s):
"""
:type s: str
:rtype: List[str]
"""
return [sequence for sequence, count in Counter(s[i:i+10] for i in range(len(s) - 9)).items() if count > 1]
One liner!!!!! Wonderful:)
ReplyDeletePython perks :)
Delete