Expected Number of Restriction Sites
This problem asks:
Given: A positive integer n (n≤1,000,000), a DNA string s of even length at most 10, and an array A of length at most 20, containing numbers between 0 and 1.
Return: An array B having the same length as A in which B[i] represents the expected number of times that s will appear as a substring of a random DNA string t of length n, where t is formed with GC-content A[i]
References
- Restriction sites
- Recognition sequences
- Expected value from probability
- Introduction to random strings
- Indicator random variables
Restate the problem
I need to figure out how likely it is that a random substring, s, is going to appear in a superstring n characters long if the GC-content of the superstring, n, is a value in A.
To figure that out, I’ll need to calculate the chances that a single character, c, in a string with gc-content k matches a character in s?
Solution steps
I started with this function to calculate the probability of a single character appearing randomly with a given gc-content ratio.
def single_prob(c, k):
if c == 'A' or c == 'T':
return (1 - k)/2
else:
return (k / 2)
I made good progress programming function to combine the individual character values into a full string probability intuitively until I ran into this roadblock:
The round() function in Python rounds a number to the nearest integer or to a specified number of decimal places. When rounding a number that ends in .5, Python uses a method called “round half to even,” which means it rounds to the nearest even number.
My first approach to always getting Python to round up when the terminating value is 5 was to import the decimal module and try:
x = Decimal('0.15')
print x.quantize(Decimal('0.1'), rounding=ROUND_HALF_UP)
That didn’t work, but I eventually succeeded in getting the correct result for the sample dataset with:
with localcontext() as ctx:
ctx.rounding = ROUND_HALF_UP
y = Decimal(x)
return y.quantize(Decimal('0.001'), rounding=ROUND_HALF_UP)
Python concepts
Post-solution notes
Challenges solved so far: 55
How many people solved this before me: 1,551
Most recent solve before me: two days ago
Time spent on challenge: 45 minutes
Most time-consuming facet: floating point arithmetic
Closing thoughts: I looked up one of the people who has solved this challenge recently and reached out to them on LinkedIn. It would be nice to have contact with someone else working these challenges at the same time. I heard back from my fellow solver!