This problem asks:

Given: Two protein strings s and t

Return: The edit distance dE(s,t)

References

  1. Homologous
  2. Edit distance
  3. Edit operations
  4. Levenshtein distance
  5. Wagner-Fischer algorithm

Restate the problem

Given two protein strings, I need to find the minimum number of insertions, deletions, or substitutions required to transform the first string into the second string.

Solution steps

First, I decided to try to implement the pseudocode definition of the Wagner-Fischer algorithm from Wikipedia.

After struggling to properly create and fill a two-dimensional array using native Python nested lists, I decided to install NumPy and use NumPy arrays which are easier for me to work with.

There were some stubborn indexing issues that took a while to work out with trial and error. The description of the Wagner-Fischer algorithm on Wikipedia has the input strings one-indexed and the matrix zero-indexed, which took me a while to implement correctly in Python, which zero-indexes everything by default.

Once the indexing issues were fixed, I got correct responses for the sample dataset on Project Rosalind as well as the example on Wikipedia.

I got a correct response on the first attempt at real data. The two strings I was given were 863 and 797 characters long. 392 edits were required to transform the first string into the second string.

Python concepts

NumPy describes itself as “The fundamental package for scientific computing with Python”. It’s a massive set of tools that comes with tons of great documentation and examples. I like using their arrays because they support indexing I understand.

Post-solution notes

Challenges solved so far: 52

How many people solved this before me: 1,915

Most recent solve before me: yesterday

Time spent on challenge: 90 minutes

Most time-consuming facet: zero-indexing

Solutions from others: Everyone implemented the Wagner-Fischer algorithm or slight variations on it.

Closing thoughts: