This problem asks:

Given: Two protein strings s and t.

Return: The edit distance dE(s,t) followed by two augmented strings s′ and t′ representing an optimal alignment of s and t.

References

  1. Alignment
  2. More on alignment
  3. Indels
  4. Gap symbols
  5. Augmented strings
  6. Edit alignment score
  7. Optimal alignment
  8. Margaret Oakley Dayoff
  9. Biopython alignment package

Restate the problem

I’m going to get two protein strings, and I need to return the count of the fewest edits possible to transform the first string into the second string, as well as the optimal alignments for both strings.

Solution steps

I read about the alignment package in Biopython and found the PairwiseAligner function to be a good fit for this challenge. After writing the code to use that package, I got a result on Project Rosalind’s sample dataset that was different from the sample output, but I felt like it was probably a mistake on the Project Rosalind site.

I returned an incorrect result on my first attempt at a challenge dataset because the print output from PairwiseAligner breaks the output into screen-width readable sections. I read the documentation, but could not find a way to get the resulting alignment strings in raw format.

Then I decided that using a library to do this wasn’t really teaching me much anyway, so I scrapped that approach entirely and started writing one based on the approach from the previous challenge, Edit Distance.

I submitted a correct response on my second attempt. This was my 58th correct result. By solving this challenge, I unlocked Project Rosalind’s “Alignment” badge level 1. I’ve solved 5 of 19 alignment challenges in the set.

I was the first person to solve this in 4 days. 1,315 people have solved this before me.