Finding a Motif in DNA
This problem asks:
Given: Two DNA strings s and t (each of length at most 1 kbp).
Return: All locations of t as a substring of s.
Restate the problem
Things I read before restating the problem:
They’re going to send me two DNA strings. I need to return every location in s that begins an instance of the string t appearing inside s.
Solution steps
First, I wrote an algorithm that steps through s and checks to see if it’s a match for the first character in t.
Then, I looked up the many functions and methods associated with slicing strings in Python.
I spent some time going down an unhelpful path using nested for statements before I realized that instead of just checking if the first character was a match, I could check for the whole substring at once with:
if (s[i:i+len(t)]) == t:
Python concepts
The history of Array slicing goes all the way back to 1957, with FORTRAN. It’s important to remember that Python always begins counting at zero.
I needed to strip the inputs because they came from Project Rosalind with trailing spaces, which prevented them from finding substrings.
Bioinformatics concepts
The Biopython library includes methods to find motifs in DNA sequences, and it’s worth looking at the source code to see how they do it differently.
There’s also a clear and concise summary of DNA motifs here.