spidey.png

This problem asks:

Given: A positive integer n ≤ 50.

Return: An array A of length 2n in which A[k] represents the common logarithm of the probability that two diploid siblings share at least k of their 2n chromosomes (we do not consider recombination for now).

References

  1. Polymorphic traits
  2. Diploids
  3. Homologs
  4. Uniform random variables
  5. Binomial random variables
  6. Binomial distributions
  7. Common logarithms

Restating the problem

This is the phrase I struggle to understand:

… probability that two diploid siblings share at least k of their 2n chromosomes.

I’m going to guess that diploid siblings have a 50% chance of inheriting any single chromosome. Given that, I think they have a 25% chance of sharing 2 chromosomes, and so on.

Solution steps

First, I implemented this code to see if my assumptions were correct:

for i in range(1, 2*n + 1):
    print(round(math.log(.5 ** i), 3), end=' ')

This generated sample output similar but not equivalent to the example output.

After reading more about binomial distributions, I installed the Scipy binomial stats package and experimented with the probability mass function.

The stats package got more complicated than I was willing to deal with, so I wrote my own probability mass function using math.factorial and removed the Scipy package.

def binomial(n, k, p):
    return (math.factorial(n) / math.factorial(k) / math.factorial(n-k)) * (p**k * (1-p)**(n-k))

I hacked at this with trial and error for a while, then looked at the solutions from others for help until I arrived at this code which returns the correct result:

for k in range(2*n, 0, -1):
    Pr += factorial(2*n)/(factorial(k)*factorial(2*n-k)) * np.power(p,k)*np.power(1-p, 2*n-k)
    A.append(round(np.log10(Pr), 4))

Post-solution notes

Challenges solved so far: 64

How many people solved this before me: 975

Most recent solve before me: 10 days

Time spent on challenge: 1.5 hours

Most time-consuming facet: properly coding the binomial density equation

Questions from others: Many solvers struggled with the precision required to have their responses marked as correct by the Project Rosalind system in 2013. They must have fixed the site because there have not been any recent questions about precision.

Solutions from others: Some solvers used library functions for the equations while others wrote their own.

Accomplishments and badges: I got the spidey achievement for solving my 64th challenge.

spidey.png

Closing thoughts: I wish I had spent more time trying to solve this myself before I looked at someone else’s code.