This problem asks:

Given: Six non-negative integers, each of which does not exceed 20,000.

Return: The expected number of offspring displaying the dominant phenotype in the next generation, under the assumption that every couple has exactly two offspring.

Restate the problem

I’ll be honest with you, I’ve read this challenge six times, and I’m still not sure what it’s asking.

I read this about expected values, twice.

I read this about uniform random variables, twice.

The Given section of the problem looks a little bit like my solution to a previous problem, so I re-read that.

Here’s the challenge: They’re going to tell me how many couples in a population have each of the six possible genotype combinations. Importantly, they’re giving me the genotype of the couple, not individuals. Then, I need to calculate the expected number of offspring in the next generation that will have the dominant phenotype, keeping in mind that an individual only needs one dominant allele to show the dominant phenotype.

Solution steps

I used the same process to solve this challenge as I used to solve iprb.

First, I read in the six integers from the file, then I build a set that represents all the couples in the population:

couples = a * [("AA", "AA")] + b * [("AA", "Aa")] + c * [("AA", "aa")] + d * [("Aa", "Aa")] + e * [("Aa", "aa")] + f * [("aa", "aa")]

Then, for each couple in the population, I calculated the probability of their offspring getting a capital A from the Punnett Square of their parents.

I did this by stepping through the population and making the following calculation:

for couple in couples:
    if couple in [("AA", "AA"), ("AA", "Aa"), ("AA", "aa")]:
         count += 2 # both children will have the dominant phenotype
    if couple in [("Aa, Aa")]:
         count += 1.5 # On average, 1.5 children will have the dominant phenotype
    if couple in [("Aa, aa")]:
         count += 1 # On average, 1 child will have the dominant phenotype

The full code is here.

When I ran this code on the sample dataset, I got the correct answer, so I was confident in my code.

Unexpectedly, when I uploaded my answer to the dataset I was given, the result came back “Wrong!”

Wrong!

I proofread my code several times and it looked right to me. I got an answer that looked right, but somehow, something is wrong somewhere.

I reviewed my code thoroughly several times, but didn’t find anything that looked wrong.

I downloaded a new dataset, ran my code again, uploaded the new results, still wrong.

Then, as I thought more about what was happening, I realized that I didn’t need to build up the population in a list, then step through the list looking for matches.

In fact, all I needed to do was:

def iev(a, b, c, d, e, f):
    return str(a*2 + b*2 + c*2 + d*1.5 + e)

Instead of building up a list, I could just calculate the expected offspring with a dominant allele directly from the numbers I got from Project Rosalind to define the population.

This felt like a breakthrough, and it was the first time in the process of working these challenges that I got excited about finding a new way to think about a challenge.

As a bonus, when I downloaded a new test dataset, ran my new code and uploaded the results, I got the sought-after “Correct” response.

Correct!

Python concepts

Despite all the time and effort that went into solving this challenge, the resulting Python code is simple. I didn’t use any new techniques.

Bioinformatics concepts

The only new bioinformatics concept introduced in this challenge was the expected value. That is, if we want to calculate the number of events that occur in a collection, we can add the individual probabilities in the collection together.

This topic is covered in detail here, and I get the sense that this is going to come up often in future Project Rosalind challenges.

Problem-solvingconcepts

I still can’t figure out why my first solution didn’t work, while the simpler code does work. They should do the same thing, but I’m glad, in a way, because getting the wrong answer forced me to think through the problem until I came to a simpler solution.

Persistence turns out to be valuable in problem solving.