This problem asks:

Given: A positive integer n (n≤1,000,000), a DNA string s of even length at most 10, and an array A of length at most 20, containing numbers between 0 and 1.

Return: An array B having the same length as A in which B[i] represents the expected number of times that s will appear as a substring of a random DNA string t of length n, where t is formed with GC-content A[i]

References

  1. Restriction sites
  2. Recognition sequences
  3. Expected value from probability
  4. Introduction to random strings
  5. Indicator random variables

Restate the problem

I need to figure out how likely it is that a random substring, s, is going to appear in a superstring n characters long if the GC-content of the superstring, n, is a value in A.

To figure that out, I’ll need to calculate the chances that a single character, c, in a string with gc-content k matches a character in s?

Solution steps

I started with this function to calculate the probability of a single character appearing randomly with a given gc-content ratio.

def single_prob(c, k):
    if c == 'A' or c == 'T':
        return (1 - k)/2
    else:
        return (k / 2)

I made good progress programming function to combine the individual character values into a full string probability intuitively until I ran into this roadblock:

The round() function in Python rounds a number to the nearest integer or to a specified number of decimal places. When rounding a number that ends in .5, Python uses a method called “round half to even,” which means it rounds to the nearest even number.

My first approach to always getting Python to round up when the terminating value is 5 was to import the decimal module and try:

x = Decimal('0.15')
print x.quantize(Decimal('0.1'), rounding=ROUND_HALF_UP)

That didn’t work, but I eventually succeeded in getting the correct result for the sample dataset with:

with localcontext() as ctx:
    ctx.rounding = ROUND_HALF_UP
    y = Decimal(x)
    return y.quantize(Decimal('0.001'), rounding=ROUND_HALF_UP)

Python concepts

Floating point arithmatic

Post-solution notes

Challenges solved so far: 55

How many people solved this before me: 1,551

Most recent solve before me: two days ago

Time spent on challenge: 45 minutes

Most time-consuming facet: floating point arithmetic

Closing thoughts: I looked up one of the people who has solved this challenge recently and reached out to them on LinkedIn. It would be nice to have contact with someone else working these challenges at the same time. I heard back from my fellow solver!