Independent Alleles
This problem asks:
Given: Two positive integers k (k≤7) and N (N≤2k). In this problem, we begin with Tom, who in the 0th generation has genotype Aa Bb. Tom has two children in the 1st generation, each of whom has two children, and so on. Each organism always mates with an organism having genotype Aa Bb.
Return: The probability that at least N Aa Bb organisms will belong to the k-th generation of Tom’s family tree (don’t count the Aa Bb mates at each level). Assume that Mendel’s second law holds for the factors.
Restate the problem
I admit, this one took me a while to understand what was being asked. It’s clear that A and B are independent variables because the problem makes that clear twice.
Finally, I boil the question down to this: “If I look at Tom’s offspring from his k th generation, what are the chances that at least N of them are Aa Bb?”
Solution steps
First, I read about Punnett squares again. Although I’ve studied these in the past and used them for previous challenges, I needed to understand how they work better.
After reading up and looking at some examples, I learned that since Tom is Aa Bb and all of his mates are Aa Bb, the probability of an Aa Bb offspring in any generation is 0.25.
I spent a long time stuck at this point. I kept re-reading the problem looking for clues, and I couldn’t figure out why they gave the example with the pips on the dominos because that didn’t seem to relate to the problem at all.
I Googled:
Python independent events
which led to several interesting explanations that were essentially repetitions of the one given in the problem. I didn’t feel any closer to a solution.
I was skimming through this pdf, page 57, when I read about Bernoulli random variablies and the binomial random variable equation.
Then, I started looking around online to see if the binomial random variable functions fit with the specifics of this challenge, and they seemed to, so I started implementing.
Python concepts
Biopython doesn’t have an implementation of the binomial random variable functions, so I used one from the SciPy library.
Specifically, I used the https://en.wikipedia.org/wiki/Probability_mass_function. The calculations fit on one line:
return sum(binom.pmf(x,2**k,0.25) for x in range(n,2**k+1))
Bioinformatics concepts
This challenge required more mathmatics than bioinformatics. In fact, the bioinformatics scenario seems, in retrospect, to be contrived just to make use of the binomial cumulative distribution function.
In any case, I found a formula that fit the situation and used a library that implemented that formula.
Problem-solvingconcepts
Reading up on topics mentioned in the challenge is almost always time well spent. In this case, I had to do quite a bit of reading to get to a topic that matched the question.