A blog about technical writing, bioinformatics, and Python.
Posts
Speeding Up Motif Finding
This problem asks:
Given: A DNA string s.
Return: The failure array of s.
Counting Subsets
This problem asks:
Given: A positive integer n (n≤1000).
Return: The total number of subsets of {1,2,…,n} modulo 1,000,000.
k-Mer Composition
A 2-mer composition from Project Rosalind
This problem asks:
Given: A DNA string s.
Return: The 4-mer composition of s.
Genome Assembly as Shortest Superstring
This problem asks:
Given: At most 50 DNA strings of approximately equal length.
Return: A shortest superstring containing all the given strings.
Perfect Matchings and RNA Secondary Structures
This problem asks:
Given: An RNA string s of length at most 80 bp having the same number of occurrences of ‘A’ as ‘U’ and the same number of occurrences of ‘C’ as ‘G’.
Return: The total possible number of perfect matchings of basepair edges in the bonding graph of s.
Counting Phylogenetic Ancestors
This problem asks:
Given: A positive integer n (3≤n≤10000).
Return: The number of internal nodes of any unrooted binary tree having n leaves.
Enumerating Oriented Gene Orderings
This problem asks:
Given: A positive integer n≤6
Return: The total number of signed permutations of length n, followed by a list of all such permutations.
Completing a Tree
This problem asks:
Given: A positive integer n (n≤1000) and an adjacency list corresponding to a graph on n nodes that contains no cycles.
Return: The minimum number of edges that can be added to the graph to produce a tree.
Reversal Distance
This problem asks:
Given: A collection of at most 5 pairs of permutations, all of which have length 10.
Return: The reversal distance between each permutation pair.
Ordering Strings of Varying Length Lexicographically
This problem asks:
Given: A permutation of at most 12 symbols defining an ordered alphabet A and a positive integer n (n≤4).
Return: All strings of length at most n formed from A, ordered lexicographically.
Matching Random Motifs
This problem asks:
Given: A positive integer N≤100000, a number x between 0 and 1, and a DNA string s of length at most 10 bp.
Return: The probability that if N random DNA strings having the same length as s are constructed with GC-content x, then at least one of the strings equals s.
Longest Increasing Subsequence
This problem asks:
Given: A positive integer n≤10000 followed by a permutation π of length n.
Return: A longest increasing subsequence of π, followed by a longest decreasing subsequence of π.
Enumerating k-mers Lexicographically
This problem asks:
Given: A collection of at most 10 symbols defining an ordered alphabet, and a positive integer n (n≤10).
Return: All strings of length n that can be formed from the alphabet, ordered lexicographically
Transitions and Transversions
This problem asks:
Given: Two DNA strings s1 and s2 of equal length (at most 1 kbp).
Return: The transition/transversion ratio R(s1,s2).
Finding a Spliced Motif
This problem asks:
Given: Two DNA strings s and t.
Return: One collection of indices of s in which the symbols of t appear as a subsequence of s.
Introduction to Random Strings
This problem asks:
Given: A DNA string s of length at most 100 bp and an array A containing at most 20 numbers between 0 and 1.
Return: An array B having the same length as A in which B[k] represents the common logarithm of the probability that a random string constructed with the GC-content found in A[k] will match s exactly.
Locating Restriction Sites
This problem asks:
Given: A DNA string of length at most 1 kbp in FASTA format.
Return: The position and length of every reverse palindrome in the string having length between 4 and 12.
Enumerating Gene Orders
This problem asks:
Given: A positive integer n≤7
Return: The total number of permutations of length n, followed by a list of all such permutations (in any order).
Finding a Shared Motif
This problem asks:
Given: A collection of k (k≤100) DNA strings of length at most 1 kbp each in FASTA format.
Return: A longest common substring of the collection.
RNA Splicing
This problem asks:
Given: A DNA string s (of length at most 1 kbp) and a collection of substrings of s acting as introns.
Return: A protein string resulting from transcribing and translating the exons of s.
Open Reading Frames
image from Project Rosalind.
This problem asks:
Given: A DNA string s of length at most 1 kbp in FASTA format.
Return: Every distinct candidate protein string that can be translated from ORFs of s.
Finding a Protein Motif
Error message seen in the course of solving this challenge.
This problem asks:
Given: At most 15 UniProt Protein Database access IDs.
Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.
Overlap Graphs
By Michel Bakni - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=151762031
This problem asks:
Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.
Return: The adjacency list corresponding to O3. You may return edges in any order.
Consensus and Profile
Stephen Cole Kleene, inventor of regular expressions
This problem asks:
Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
Return: A consensus string and profile matrix for the collection.
Calculating Protein Mass
image of tandem mass spectrometry from Wikipedia
This problem asks:
Given: A protein string P of length at most 1000 aa.
Return: The total weight of P.
Independent Alleles
This problem asks:
Given: Two positive integers k (k≤7) and N (N≤2k). In this problem, we begin with Tom, who in the 0th generation has genotype Aa Bb. Tom has two children in the 1st generation, each of whom has two children, and so on. Each organism always mates with an organism having genotype Aa Bb.
Return: The probability that at least N Aa Bb organisms will belong to the k-th generation of Tom’s family tree (don’t count the Aa Bb mates at each level). Assume that Mendel’s second law holds for the factors.
Translating RNA into Protein
This problem asks:
Given: An RNA string s corresponding to a strand of mRNA (of length at most 10 kbp).
Return: The protein string encoded by s.
Inferring mRNA from Protein
The standard RNA codon table organized in a wheel, from Wikipedia.
This problem asks:
Given: A protein string of length at most 1000 aa.
Return: The total number of different RNA strings from which the protein could have been translated, modulo 1,000,000.
Calculating Expected Offspring
This problem asks:
Given: Six non-negative integers, each of which does not exceed 20,000.
Return: The expected number of offspring displaying the dominant phenotype in the next generation, under the assumption that every couple has exactly two offspring.
Mortal Fibonacci Rabbits
This problem asks:
Given: Positive integers n≤100 and m≤20.
Return: The total number of pairs of rabbits that will remain after the n-th month if all rabbits live for m months.
Mendel's First Law
This problem asks:
Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals are homozygous dominant for a factor, m are heterozygous, and n are homozygous recessive.
Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate.
Finding a Motif in DNA
This problem asks:
Given: Two DNA strings s and t (each of length at most 1 kbp).
Return: All locations of t as a substring of s.
Counting Point Mutations
This problem asks:
Given: Two DNA strings s and t of equal length (not exceeding 1 kbp).
Return: The Hamming distance dH(s,t).
Rabbits and Recurrence Relations
This problem asks:
Given: Positive integers n≤40 and k≤5.
Return: The total number of rabbit pairs that will be present after n months, if we begin with 1 pair and in each generation, every pair of reproduction-age rabbits produces a litter of k rabbit pairs (instead of only 1 pair)
Computing GC Content
This problem asks:
Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).
Return: The ID of the string having the highest GC-content, followed by the GC-content of that string.
Complementing a Strand of DNA
This problem asks:
Given: A DNA string s of length at most 1000 bp.
Return: The reverse complement s’c of s.
Transcribing DNA into RNA
This problem asks:
Given: A DNA string t having length at most 1000 nt.
Return: The transcribed RNA string of t
Counting DNA Nucleotides
This problem asks:
Given: A DNA string s of length at most 1000 nt.
Return: Four integers (separated by spaces) counting the respective number of times that the symbols ‘A’, ‘C’, ‘G’, and ‘T’ occur in s.
Welcome and Introduction
subscribe via RSS