Munch Lab

Applied programming week six

Lectures

This week we will look at classes and objects. We will also look at how to find bugs in your code.

Computer exercises

Complete the Genome Assembly exercise. This is a big and difficult exercise. (It is actually a former exam assignment). You are meant to compete as much as you can before the computer exercises. Your TA will then help you along where you got stuck so you you can finish it on your own.

Notice that you can still solve remaining problems in the exercise, even if you get stuck in one problem.  With each problem comes an “example usage” specifying the output that corresponds to a certain input. Say you were asked to write a function upperAndReverse(s) that makes a string upper case and reverses it. You are told that with argument “kasper” the function should return “REPSAK”. Then, if you get stuck because you do not know how to revere a string you can still make a function to use in the remaining exercise by “hard coding” a return value like this

def upperAndReverse(s):
    s = s.upper()
    return "REPSAK"

Reading material

Apart from the pages I put up I recommend

of How to Think Like a Computer Scientist. These chapters have more detail than what I
will describe in the lectures so think of this material as an alternative source of
learning that allows you to find more detail and see the topics explained in a different
way.

Weekly assignment

This assignment is a continuation of the exercise from last week. You should notice the resemblance with the code you were meant to read and understand at the end of the Dictionary exercise.  As for last week the constants and data that we will use in this exercise path are defined in the module exerciseWeek4.py“. Start by downloading and skimming it you don’t have it handy from last week.

Codon usage bias refers to differences in the frequency of occurrence of synonymous codons
in coding DNA. A codon is a series of three nucleotides (triplets) that encodes a specific
amino acid residue in a polypeptide chain or for the termination of translation (stop
codons). There are 64 different codons (61 codons encoding for amino acids plus 3 stop
codons) but only 20 different translated amino acids. The overabundance in the number of
codons allows many amino acids to be encoded by more than one codon. Because of such
redundancy it is said that the genetic code is degenerate. Different organisms often show
particular preferences for one of the several codons that encode the same amino acid, that
is, a greater frequency of one will be found than expected by chance. How such preferences
arise is a much debated area of molecular evolution.

Write a function, findCodonBias(orf). Given an open reading frame like the ones in the
list exerciseWeek4.exactGenes it should return a dictionary that maps each amino acid to
another dictionary mapping each codon, that codes for that amino acid, to its relative
usage. Your returned dictionary should not contain entries for unused amino acids. You
should handle both uppercase and lowercase input.

Example usage:

print findCodonBias(exerciseWeek4.exactGenes[0])

Should print:

{'A': {'GCA': 0.0, 'GCC': 0.0, 'GCT': 1.0, 'GCG': 0.0}, 'C': {'TGC': 0.0, 'TGT': 1.0},
'E': {'GAG': 0.33333333333333331, 'GAA': 0.66666666666666663}, 'D': {'GAT': 1.0,
'GAC': 0.0}, 'G': {'GGT': 0.33333333333333331, 'GGG': 0.0,
'GGA': 0.66666666666666663, 'GGC': 0.0}, 'F': {'TTC': 0.0, 'TTT': 1.0}, 'I': {'ATT': 1.0, 'ATC': 0.0, 'ATA': 0.0}, 'H': {'CAC': 0.0, 'CAT': 1.0}, 'K': {'AAG': 0.20000000000000001, 'AAA': 0.80000000000000004}, '*': {'TAG': 0.0, 'TGA': 1.0,
'TAA': 0.0}, 'M': {'ATG': 1.0}, 'L': {'CTT': 0.0, 'CTG': 0.66666666666666663,
'CTA': 0.0, 'CTC': 0.0, 'TTA': 0.33333333333333331, 'TTG': 0.0}, 'N': {'AAT': 0.5,
'AAC': 0.5}, 'Q': {'CAA': 0.59999999999999998, 'CAG': 0.40000000000000002},
'P': {'CCT': 0.5, 'CCG': 0.0, 'CCA': 0.5, 'CCC': 0.0}, 'S': {'TCT': 0.0, 'AGC': 0.0,
'TCG': 0.0, 'AGT': 0.5, 'TCC': 0.0, 'TCA': 0.5}, 'R': {'CGA': 0.33333333333333331,
'CGC': 0.0, 'AGA': 0.33333333333333331, 'AGG': 0.0, 'CGG': 0.0,
'CGT': 0.33333333333333331}, 'T':{'ACC': 0.0, 'ACA': 0.0, 'ACG': 0.0, 'ACT': 1.0},
'W': {'TGG': 1.0}, 'V': {'GTA': 0.66666666666666663, 'GTC': 0.0,
'GTT': 0.16666666666666666, 'GTG': 0.16666666666666666}, 'Y': {'TAT': 1.0,
'TAC': 0.0}}

Think about what is is you need to do. From the dictionaries exercise you know how to count
letters in a string using a dictionary. What you need to do here is similar, but here you
have an extra layer of dictionaries.

After “uppercasing” orf you can call splitCodons(orf) to get a list of the codons in the
ORF. You then loop over the codons, translate each codon into an amino acid aa using
translateCodon(codon). Then if your top dictionary is D you can count your codons this
way:

D[aa][codon] += 1

but before doing so you need to check each time if the top dictionary has the aa key and
if the nested one have the codon key. If not you need to make them and set the value of
the latter to 0.

Your dictionary should not include amino acids that are not in the ORF, but you want all
the possible codons for each amino acid represented in your nested dictionaries. Your
result would not correctly represent codon bias if the codons that where not used was not
included. We include the codons that we did not see in the ORF by looping over the
exerciseWeek4.aminoAcidMap like this:

for codon, aa in exerciseWeek4.aminoAcidMap.items():

and then if aa is in your top dictionary and if codon is not in corresponding nested
dictionary you set D[aa][codon] = 0.0.

Finally you need to normalize the counts so they become frequencies (i.e. count/total). To
do that you need to loop over the keys in the top dictionary, and use each key to retrieve
all the counts for that amino acid, and sum them: nrCodons = sum(D[aa].values()). Then use
a nested for loop (a for loop within the for loop) to loop over the keys (the codons) of
corresponding nested dictionary. In that nested for loop you can then divide each count by
the total number of codons for that amino acid:

D[aa][codon] /= float(nrCodons)

All that remains then is to return the finished dictionary.

Write a function printCodonBias(orf), that given a DNA sequence it should print a pretty
table showing, for each amino acid, the relative usage of each codon coding for that amino
acid.

Example usage:

printCodonBias(exerciseWeek4.exactGenes[0])

should print

A:
GCA:   0%
GCC:   0%
GCT: 100%
GCG:   0%
C:
TGC:   0%
TGT: 100%
E:
GAG:  33%
...

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: