Munch Lab

Selective sweeps across twenty million years of primate evolution

Our paper in MBE shows how patterns of incomplete lineage sorting measures linked selection on evolutionary timescales. We show that a large portion of linked selection is due to selective sweeps and that the human-chimpanzee ancestor experienced a substantially higher frequency of sweeps than did the human-orangutang ancestor. These ancestral sweeps are enriched for sweeps in modern humans suggesting that several regions of the genome are repeatedly hit by sweeps.

Link to paper

Applied programming 2015 week seven

Reading

None. Keep revisiting what we have been through at lectures and in the online book.

Lectures

The Thursday lecture is canceled.  At the Tuesday lecture we will treat some of the issues you find most difficult, we will evaluate the course and talk about the exam.

Exercises  (TØ)

The exercise for this week is about assembling genome sequence from sequencing reads. You can find a link to the exercise in the outline table on the course page.

Mandatory assignment

This is the last week so there is no assignment.

Applied programming 2015 week five

Reading

There is no reading for this week. Spend the extra time looking back on the chapters slides and exercises we have covered so far.

Lectures

At the Thursday lecture I will talk more about dictionaries.  At the Tuesday lecture we will talk about how to put all the things together that we have learned so far, and a bit about modules.

Exercises  (TØ)

The exercise for this week is about analyzing the base composition of HIV sequences and. It is available as a link on the main course page in the table showing the course outline.

Mandatory assignment

Write a function,  parse_fasta(filename),  that should read the file named filename. This file should contain multiple sequence entries in Fasta format and the function must parse this data and return a list of tuples of the form (header, sequence). Download this file and use as input. The first entry in the file is:

>numberOne this is sequence one
AGTTTCCCTCAAATCACTCTTTGGCAACGACCCATCGTCACAGTAAGAAT
AGAGGGACAGCTAAGAGAAGCTCTATTAGATACAGGAGCAGACGATACAG
TATTAGAAGACATAGATTTGCCAGGAAAATGGAAACCAAGAATGATAGGG
GGAATTGGAGGCTTCATCAAGGTAAAACAGTATGATCAGATATCTATAGA
AATTTGTGGAAAAAGAGCTATAGGTACAGTATTAGTAGGACCTACACCTG
TCAACATAATTGGAAGAAACATGATGACGCAGATTGGCTGTACTTTAAAT
TTGGCAATTAGTCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAAT
GGATGGGCCAAAGGTTAAACAATGGCCACTGACAGCAGAAAAAATAAAAG
ATTGGGCCTGAAAATCCA

so this tuple should be the first one in your list:

("numberOne this is sequence one", "AGTTTCCCTCAAATCACTCTTTGGCAACGACCCATCGTCACAGTAAGAATAGAGGGACAGCTAAGAGAAGCTCTATTAGATACAGGAGCAGACGATACAGTATTAGAAGACATAGATTTGCCAGGAAAATGGAAACCAAGAATGATAGGGGGAATTGGAGGCTTCATCAAGGTAAAACAGTATGATCAGATATCTATAGAAATTTGTGGAAAAAGAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAACATGATGACGCAGATTGGCTGTACTTTAAATTTGGCAATTAGTCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGGCCAAAGGTTAAACAATGGCCACTGACAGCAGAAAAAATAAAAGATTGGGCCTGAAAATCCA")

This assignment can be solved in a lot of different ways — of varying complexity — so be inventive and after you have handed in your own assignment, have a look at what your friends have done to solve the problem.

Here is one approach: read the entire content of the file into a string using the read()
method of the opened file. Then you can use the split() method to split the the string
into a list of individual Fasta entries. Try using '>' as argument to the split method then and print the resulting list. Now you can use a for loop to iterate over the elements of this list (that are each strings representing a Fasta entry)  so you can produce tuple for each with the header and the sequence. Hint: use the splitlines() method to split each string (Fasta entry) into the individual lines it contains (i.e. the header line and all the sequence lines). Then you can fish out the header line from that list. To produce the sequence you need to join all the the sequence lines.  Here is some code to get you started:

fasta_file = open('input.fasta', 'r')
file_content = fasta_file.read()
list_of_entries = file_content.split(">"):
for entry in list_of_entries:
    print entry # to see what entry is

# (figure out why the first is empty and remove it before the loop)
# here you are on your own but try the splitlines method...

Still, there are many other, and better, ways to do it. You can think about a way once you are done with the large exercise this week.

 

Send the file with python code to Dan no later than Tuesday (24/11) at 12.00 (not 24.00).

Applied programming 2015 week four

Reading

Your should read/complete the exercise 39 in Learn Python the Hard Way before the Thursday lecture.

Lectures

At the Thursday lecture we will continue working with lists. At the Tuesday lecture we will talk about a special data structure called dictionaries.

Exercises  (TØ)

The exercise for this week is about comparing HIV sequences and is available as a link on the main course page in the table showing the course outline.

Mandatory assignment

This weeks assignment will walk you through some of the most common string manipulations.

A palindrome is a string that is spelled the same way backwards and forwards. Write a function, isPalindrome(s), that returns True if s is a palindrome and False
otherwise.

Example usage:

print isPalindrome('abcba')

Should print True

print isPalindrome('foo')

Should print False

One approach to this is to run through s from the first to the middle character and for
each character check if the character is equal to the character at the same index from the
right rather than the left. Remember that the first character of a string is at index 0
and the last at index -1, the second character is at index 1 and the second last at index
-2 and so forth.

Since you need to run through the string from the first to the middle character you first
need to figure out how many characters that corresponds to. Say your palindrome is
p="ACTGTCA", then the number of indexes you need to loop over with a for loop is
len(p)/2. Figure out how to make range() return indexes you can use to access the
characters in the first half of the sequence. Then make a for loop where you iterate
over the indexes you get from range(). Try to make the for loop print out the first
half of the characters.

Once you get this far you need to compare each character from the first half to the
corresponding ones starting from the other end of the palindrome. Figure out how to change
each index used for the first half to the corresponding index for the other half so you
can compare the relevant pairs. (You need to compare index 0 with -1, 1 with -2 and so on…)

Now try to make the for loop print both the character from the first half and the
corresponding character from the other end. If you got the indexes right you will see that
the A print with the A from the other end, the C with the C and so on.

Write an if statement in the for loop that tests if the two corresonding characters are the same. If your string is a palindrome all pairs are the same. So as soon as you see a
pair that is not the same, you know it is not a palindrome and you can let your function
return False like this:

if leftCharacter != rightCharacter:
return False

On the other hand, if all pairs pass this test then it is a palindrome and the function
should return True when exiting the for loop.

Word count

Write a function, wordCount(s), that counts the number of words in a string.

Example usage:

print wordCount('foo bar')
2

Useful method:

Re-formatting text

Write a function, reformat(s), that takes a string, s, and replaces all white spaces
(spaces, tabs) with a single space.

Example usage:

s = "foo    bar      baz"
print reformat(s)
foo bar baz

Useful methods:

  • split()
  • join()

go look them up – or just google “python join”.

Send the file with python code to Dan no later than Tuesday (24/11) at 12.00 (not 24.00).

Exercise: Files and Functions

This exercise is split into two parts: one that helps you train how you can make Python read and write files on your computer, and another that serve to build a familiarity with functions. Remember that purpose of the exercises is not to answer the questions in it but train the chain of thought that allows you to answer them. Play around with the code for each example and see what happens if you change it a bit.

Files

Reading files lets you access data that you can process with your code. Here we will just make a small example file that you can work on. So the first thing you should do is to create a file with text in it that you can read from your python script. Open your Sublime Text editor and type this in it:

This is the first line
This is the second line
This is the third line

Then save it in the same directory (folder) where you put your Python code and name it ‘file_to_read_from.txt’.

Exercise: reading from a file

To read from a file on your computer you need the name of the file. To create a connection between your script and the file you can use the built-in function open(). You call it like this:

f = open('file_to_read_from.txt', 'r')

The first argument to the open function is a string with the name of the file. The second argument is a string with one character in it. If you want to read from the file this must be an 'r'. If you want to write to the file it must be an 'w' (but don’t try this right now). In other words, you need to decide if you want to read from or write to a file before you open it. If you specify a name for a file that does not exist you will get an error. The open function returns a special type of value that points to the beginning of the file you have specified. Here we store it in the variable f.

To read all the content in a file you use the read() method which is associated with the file value. A method is actually a little pre-made function that is packaged together with a Python value. This may see strange at first but you will soon learn that all python values (also numbers and text) have methods that associate them with extra functionality. To call the read method on the file value that f points to you write f.read(). Notice the dot that connects f and read. This tells python that read is a method associated with the value in f. Here f.read() evaluates to the content of the file that f points to.

file_content = f.read()

Calling read() also makes f point to the end of the file (because you are done reading it). You can think of as an index finger pointing to the place in the file you are at. When you opened the file f pointed at the beginning. When you had read it using the read method you where at the end.

Write the code needed to open the file called 'file_to_read_from.txt', read its content into a variable, print the content, read the content again and print it again.

… open file …
… read content …
… print the content …
… read content again …
… print the content you read this time…

What happens the second time you try to read from the file, and why?

Exercise: reading from a file 2

The position in the file that f points to will change as you read the file (to keep track of how much of the file you have read). If you want it to point to a new place in the file – say if you want to read the from the beginning again – you can use the seek method. If you call f.seek(0) you will make the f point to position 0, which is the beginning of the file. Try to modify your code by inserting an f.seek(0) statement like this:

… open file …
… read content …
… print the content …
f.seek(0)
… read content again …
… print the content you read this time…

What happens not and how (and why) is it different from what the code produced before.

Exercise: reading from a file 3

Now try to modify your code so you use the readline method instead of the read method. What happens now?

Exercise: writing to a file

To write results produced by your script to a file on your computer, you need to open it in much the same way as when you wanted to read from a file. Only now you specify a 'w' as the second argument to the open method. Look at the code below and decide what it does before you copy it (no copy/paste), and run it. What is the name of the file that we write to? What is printed to the file? (open ‘file_to_write_to.txt’ in Sublime Text and see) What does '\n' represent? What does the close method do?

f = open('file_to_write_to.txt', 'w')
f.write('First line\n')
f.write('Second line\n')
f.close()

Now close ‘file_to_write_to.txt’ in Sublime Text. Try to modify the code to print ‘1. line’ and ‘2. line’ and run that. Then open ‘file_to_write_to.txt’ again and see what happens. I hope that you have now learned that if you open a file to write to that does not exist – then it is created for you. You should also have learned that if open a file to write to that already exists then the existing file replaced with an empty file that you can write to.

Functions

These exercises will train your familiarity with functions.

Exercise: show and tell. Refer to the lecture slides if necessary!

def power(a, b):
    print "This function computes %d**%d" % (a, b)
    return a**b

print power(4, 2)

What does this function do? How many parameters does it have? How many statements does the function have? What does the function print? How many values does it return? What is the difference between return and print?

Try (possibly strange) variations of the code like the ones below to better understand the contribution of each line of code. You can begin with:

def power(a, b):
    print "This function computes %d**%d" % (a, b)
    return a**b

power(4, 2)

and

def power(a, b):
    print "This function computes %d**%d" % (a, b)
    return a**b

result = power(4, 2)
print result

and

def power(a, b):
    "This function computes %d**%d" % (a, b)
    return a**b

print power(4, 2)

and

def power(a, b):
    print "This function computes %d**%d" % (a, b)
    a**b

print power(4, 2)
def power(a, b):
    print a**b
    return "This function computes %d**%d" % (a, b)

print power(4, 2)

and

def power(a, b):
    return a**b
    print "This function computes %d**%d" % (a, b)

print power(4, 2)

Exercise

Define a function called “diff” which takes two parameters, x and y, and returns the difference between x and y.

Example:
def diff(x, y):
   ….

diff(8, 2) # should return 6
diff(-1, 2) # should return -3

Exercise

Define a function called “all_equal” that takes five arguments and returns True if all five arguments have the same value and False otherwise. The function should work with any input, for example:

all_equal("Dan", "Dan", "Dan", "Dan", "Dan")
all_equal(0, 0, 0, 0, 0)
all_equal(0.5, 0.5, 0.5, 0.5, 0.5)
all_equal(True, True, True, True, True)

Hint: You test equality with a == b. Now think back to what you learned about logic. Which operator can you use to ensure that a == b and b == c?

Exercise

Define a function called “is_even” which takes one argument and returns True if (and only if) this is an even number.

is_even(8) # should return True
is_even(3) # should return False

Exercise

Define a function called “is_odd” which takes one argument and returns True if (and only if) the argument is an odd number.

is_odd(8) # should return False
is_odd(3) # should return True

Can you use the function you just defined, is_even, to complete this exercise? How? Why is that a good idea?

Exercise

Define a function called “is_nucleotide_symbol” which takes one argument and returns True if this is either A, C, G, T, a, c, g or t, and False in any other case.

Name your parameters something sensible… e.g. symbol..

is_nucleotide_symbol("A") # should return True
is_nucleotide_symbol("B") # should return False
is_nucleotide_symbol("Dan sucks") # should return False
is_nucleotide_symbol("") # should return False

Exercise

Define a function called “is_complementary_base” which takes two parameters, base1, base2, and returns True if base2 is the complementary of base1, and False otherwise.

is_complementary_base("A", "G") # should return False
is_complementary_base("A", "T") # should return True
is_complementary_base("T", "A") # should return False
is_complementary_base("Dan sucks", "A") # should return False

Can you use the function you defined in the previous exercise to complete this exercise? How? Why is that a good idea?

Exercise

Define a function called “is_series_cool” which takes one parameter, tv_series, and returns True if (and only if) the parameter is either the string “Dr Who”, “Doctor Who” or “Narcos”.

is_series_cool("Days of our Lives") # should return False
is_series_cool("Doctor Who") # should return True
is_series_cool(42) # should return False

Exercise: functions in expressions

Consider this function definition that takes a single number as argument:

def square(n):
    return n**2

What does it do? What does it return? What number does square(2) then represent?

Below I have used it in some expression that are printed. Make sure you understand what each expression evaluates to. Do the explicit substitutions and replacements on paper before you run it. Remember that we can substitute a function call (like square(2)) for the value it returns, just like we can substitute a variable x for the value it points to.

print square(3)
print square(2 + 1)
print square(2) * 2 + square(3)
print square(square(2))
print square(2 * square(1) + 2)

Exercise: not for the faint-hearted

You already made a function that computes Fahrenheit from Celcius. You could do that because you knew the linear relationship between the two (you knew the slope and the intercept that defined Fahrenheit as a linear function of Celcius).

If you where to make a function makes this computation it could look like this:

def conversion(celcius):
    slope = 5/9.0
    intercept = 32
    return celcius * slope + intercept

Try to change this function so it takes three arguments, corresponding to celcius, slope and intercept so you can call it like this to convert 27 degrees celcius: conversion(37, 5/9.0, 32). Now you have a function that can do any linear conversion that you can put inside an other function like this:

def celcius2fahrenheit(celcius):
    return conversion(celcius, 5/9.0, 32)

Now try to extend this to a different problem: It has been found that the height and weight of a person are related by a linear equation with slope = 0.55 and intercept = -25. Define a function called “predict_weight” which takes just one argument, the height of a person, and returns the estimated weight of the person.