Global Alignment with Scoring Matrix
This problem asks:
Given: Two protein strings s and t in FASTA format (each of length at most 1000 aa).
Return: The maximum alignment score between s and t. Use: The BLOSUM62 scoring matrix. Linear gap penalty equal to 5 (i.e., a cost of -5 is assessed for each gap symbol).
References
- Alignment scores
- Sequence alignment scoring functions
- Blocks Substitution Matrix (BLOSUM)
- Global alignments
- BLOSUM62 scoring matrix
- Gap penalties
Restating the problem
I’m going to get two protein strings. I need to calculate their BLOSUM62 alignment score.
Solution steps
I installed the Biopython Pairwise Aligner and read the documentation to set the parameters as shown here:
aligner = Align.PairwiseAligner()
aligner.substitution_matrix = substitution_matrices.load("BLOSUM62")
aligner.open_gap_score = -5
aligner.extend_gap_score = -5
alignments = aligner.align(s, t)
print(alignments[0].score)
This configuration returned a correct result on challenge dataset.
Post-solution notes
Challenges solved so far: 65
How many people solved this before me: 934
Most recent solve before me: 16 days ago
Time spent on challenge: 40 minutes
Most time-consuming facet: reading the Biopython Aligner documentation
Solutions from others: Many solvers wrote their own alignment algorithms.
Closing thoughts: I could modify my existing edit distance algorithm to keep track of the alignment score, but I’m deciding not to do that for now. Maybe I’ll come back later and write that code. For now, I’m satisfied with my solution that uses Biopython.