Edit Distance
This problem asks:
Given: Two protein strings s and t
Return: The edit distance dE(s,t)
References
Restate the problem
Given two protein strings, I need to find the minimum number of insertions, deletions, or substitutions required to transform the first string into the second string.
Solution steps
First, I decided to try to implement the pseudocode definition of the Wagner-Fischer algorithm from Wikipedia.
After struggling to properly create and fill a two-dimensional array using native Python nested lists, I decided to install NumPy and use NumPy arrays which are easier for me to work with.
There were some stubborn indexing issues that took a while to work out with trial and error. The description of the Wagner-Fischer algorithm on Wikipedia has the input strings one-indexed and the matrix zero-indexed, which took me a while to implement correctly in Python, which zero-indexes everything by default.
Once the indexing issues were fixed, I got correct responses for the sample dataset on Project Rosalind as well as the example on Wikipedia.
I got a correct response on the first attempt at real data. The two strings I was given were 863 and 797 characters long. 392 edits were required to transform the first string into the second string.
Python concepts
NumPy describes itself as “The fundamental package for scientific computing with Python”. It’s a massive set of tools that comes with tons of great documentation and examples. I like using their arrays because they support indexing I understand.
Post-solution notes
Challenges solved so far: 52
How many people solved this before me: 1,915
Most recent solve before me: yesterday
Time spent on challenge: 90 minutes
Most time-consuming facet: zero-indexing
Solutions from others: Everyone implemented the Wagner-Fischer algorithm or slight variations on it.
Closing thoughts: