# Exercise: Base composition of HIV sequences

In this exercise we want to compute the frequency of the four nucleotides in the gene.

### Counting bases

Write a function, `countBases(seq)`

, that, given a DNA string `seq`

returns a dictionary that maps each base to the number of occurrences of that base in `seq`

.

Example usage:

print countBases('CTGGCCCT')

`{'A': 1, 'C': 4, 'T': 2, 'G': 2}`

Then try it out on your HIV sequence and see what you get.

### Computing frequencies

Write a function, `baseComposition(seq)`

, that given a DNA string `seq`

returns a dictionary with the frequency of each base. Call `countBases(seq)`

from within `baseComposition(seq)`

to count the bases.

Example usage:

frequencies = baseComposition('ACTG') print frequencies

Should print

`{'G': 0.25, 'T': 0.25, 'C': 0.25, 'A': 0.25}`

### Printing frequencies

Write a function, `printFrequencies(seq)`

, that given a dictionary with frequencies (like that returned by `baseComposition`

) prints the frequency of each base like this:

`A: 0.25`

C: 0.25

T: 0.25

G: 0.25

### Now for something more complicated…

The purpose of this last exercise is to see if you can understand a more complicated code example. First download this file:

Once you have downloaded the file you can open it in Sublime Text (yes with works for all kinds of text files) and see what it looks like. Each line in the sequence file has a sequence name followed by a space followed by a sequence.

Now understand and explain in detail what the code below does and how. Write it (no copy/paste) into your editor so you can add print the values of different variables.

hivFile = open('hivsequences.txt', 'r') statistics = {} for l in hivFile: name, seq = l.split() if name not in statistics: statistics[name] = {} for b in seq: if b not in statistics[name]: statistics[name][b] = 0 statistics[name][b] += 1 hivFile.close() for name in statistics: print name total = sum(statistics[name].values()) for b in statistics[name]: print '\t', b, statistics[name][b] / float(total)

### Solutions to exercise

Code can be downloaded from the last table on the course front page.