This problem asks:

Given: Two protein strings s and t in FASTA format (each of length at most 1000 aa).

Return: The maximum alignment score between s and t. Use: The BLOSUM62 scoring matrix. Linear gap penalty equal to 5 (i.e., a cost of -5 is assessed for each gap symbol).

References

  1. Alignment scores
  2. Sequence alignment scoring functions
  3. Blocks Substitution Matrix (BLOSUM)
  4. Global alignments
  5. BLOSUM62 scoring matrix
  6. Gap penalties

Restating the problem

I’m going to get two protein strings. I need to calculate their BLOSUM62 alignment score.

Solution steps

I installed the Biopython Pairwise Aligner and read the documentation to set the parameters as shown here:

aligner = Align.PairwiseAligner()
aligner.substitution_matrix = substitution_matrices.load("BLOSUM62")
aligner.open_gap_score = -5
aligner.extend_gap_score = -5
alignments = aligner.align(s, t)
print(alignments[0].score)

This configuration returned a correct result on challenge dataset.

Post-solution notes

Challenges solved so far: 65

How many people solved this before me: 934

Most recent solve before me: 16 days ago

Time spent on challenge: 40 minutes

Most time-consuming facet: reading the Biopython Aligner documentation

Solutions from others: Many solvers wrote their own alignment algorithms.

Closing thoughts: I could modify my existing edit distance algorithm to keep track of the alignment score, but I’m deciding not to do that for now. Maybe I’ll come back later and write that code. For now, I’m satisfied with my solution that uses Biopython.