tandem_mass_spectrometry.png

image of tandem mass spectrometry from Wikipedia

This problem asks:

Given: A protein string P of length at most 1000 aa.

Return: The total weight of P.

Required reading

  1. Weighted alphabet
  2. Peptide bond
  3. Monoisotopic mass table
  4. Proteomics
  5. Tandem mass spectrometry

Restate the problem

There is a monoisotopic mass table provided with the challenge. For each protein in the string of proteins they provide, I need to look up the weight in the table and add all the weights together.

Solution steps

I decided to use Biopython instead of writing my own table look-up code, so the first step was to find the right Biopython method to use.

The Bio.SeqUtils.ProtParam module includes a molecular_weight method which is an exact fit for this challenge.

When I set the isotope mode to True and subtracted 18.01056 da from the final result to account for the single water molecule mentioned in the challenge, I got the correct weight for the sample dataset.

Python concepts

My solution code for this challenge is here.

There were no new Python concepts used in this solution.

Writing the look-up code in Python would have used a dictionary to hold the molecular mass table, then an algorithm like this to look up the masses and add them together:

mass = 0
for char in protein:
    mass += amino_mass[char]

Bioinformatics concepts

Although adding these protein masses together, discovering the weights of the different proteins took a lot of work.