Munch Lab

Exercise: Base composition of HIV sequences

In this exercise we want to compute the frequency of the four nucleotides in the gene.

Counting bases

Write a function, countBases(seq), that, given a DNA string seq returns a dictionary that maps each base to the number of occurrences of that base in seq.

Example usage:

print countBases('CTGGCCCT')

{'A': 1, 'C': 4, 'T': 2, 'G': 2}

Then try it out on your HIV sequence and see what you get.

Computing frequencies

Write a function, baseComposition(seq), that given a DNA string seq returns a dictionary with the frequency of each base. Call countBases(seq) from within baseComposition(seq) to count the bases.

Example usage:

frequencies = baseComposition('ACTG')
print frequencies

Should print
{'G': 0.25, 'T': 0.25, 'C': 0.25, 'A': 0.25}

Printing frequencies

Write a function, printFrequencies(seq), that given a dictionary with frequencies (like that returned by baseComposition) prints the frequency of each base like this:

A: 0.25
C: 0.25
T: 0.25
G: 0.25

Now for something more complicated…

The purpose of this last exercise is to see if you can understand a more complicated code example. First download this file:

hivsequences.txt

Once you have downloaded the file you can open it in Sublime Text (yes with works for all kinds of text files) and see what it looks like. Each line in the sequence file has a sequence name followed by a space followed by a sequence.

Now understand and explain in detail what the code below does and how. Write it (no copy/paste) into your editor so you can add print the values of different variables.

hivFile = open('hivsequences.txt', 'r')
statistics = {}
for l in hivFile:
name, seq = l.split()
if name not in statistics:
statistics[name] = {}
for b in seq:
if b not in statistics[name]:
statistics[name][b] = 0
statistics[name][b] += 1
hivFile.close()

for name in statistics:
print name
total = sum(statistics[name].values())
for b in statistics[name]:
print '\t', b, statistics[name][b] / float(total)

Solutions to exercise

Code can be downloaded from the last table on the course front page.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: