Munch Lab

Elephants and the mesh of life

The evolution and diversification of life are nicely depicted by a tree – with the original first living organism at the root and the main branches splitting into still smaller branches and twigs that eventually lead to the living species at the leaves. But what about all the species that went extinct, you may ask. The dinosaurs found themselves in dire straits 65 million years ago. In fact, species go extinct all the time. Whereas recently emerged species most likely persist, species that arose hundreds of million years ago have most likely gone extinct since then. This continuously removes twigs and branches from the tree of life, which is why it has ended up looking like, well, a tree.

Read more…

Selective sweeps across twenty million years of primate evolution

Our paper in MBE shows how patterns of incomplete lineage sorting measures linked selection on evolutionary timescales. We show that a large portion of linked selection is due to selective sweeps and that the human-chimpanzee ancestor experienced a substantially higher frequency of sweeps than did the human-orangutang ancestor. These ancestral sweeps are enriched for sweeps in modern humans suggesting that several regions of the genome are repeatedly hit by sweeps.

Link to paper

Applied programming 2015 week seven

Reading

None. Keep revisiting what we have been through at lectures and in the online book.

Lectures

The Thursday lecture is canceled.  At the Tuesday lecture we will treat some of the issues you find most difficult, we will evaluate the course and talk about the exam.

Exercises  (TØ)

The exercise for this week is about assembling genome sequence from sequencing reads. You can find a link to the exercise in the outline table on the course page.

Mandatory assignment

This is the last week so there is no assignment.

Applied programming 2015 week five

Reading

There is no reading for this week. Spend the extra time looking back on the chapters slides and exercises we have covered so far.

Lectures

At the Thursday lecture I will talk more about dictionaries.  At the Tuesday lecture we will talk about how to put all the things together that we have learned so far, and a bit about modules.

Exercises  (TØ)

The exercise for this week is about analyzing the base composition of HIV sequences and. It is available as a link on the main course page in the table showing the course outline.

Mandatory assignment

Write a function,  parse_fasta(filename),  that should read the file named filename. This file should contain multiple sequence entries in Fasta format and the function must parse this data and return a list of tuples of the form (header, sequence). Download this file and use as input. The first entry in the file is:

>numberOne this is sequence one
AGTTTCCCTCAAATCACTCTTTGGCAACGACCCATCGTCACAGTAAGAAT
AGAGGGACAGCTAAGAGAAGCTCTATTAGATACAGGAGCAGACGATACAG
TATTAGAAGACATAGATTTGCCAGGAAAATGGAAACCAAGAATGATAGGG
GGAATTGGAGGCTTCATCAAGGTAAAACAGTATGATCAGATATCTATAGA
AATTTGTGGAAAAAGAGCTATAGGTACAGTATTAGTAGGACCTACACCTG
TCAACATAATTGGAAGAAACATGATGACGCAGATTGGCTGTACTTTAAAT
TTGGCAATTAGTCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAAT
GGATGGGCCAAAGGTTAAACAATGGCCACTGACAGCAGAAAAAATAAAAG
ATTGGGCCTGAAAATCCA

so this tuple should be the first one in your list:

("numberOne this is sequence one", "AGTTTCCCTCAAATCACTCTTTGGCAACGACCCATCGTCACAGTAAGAATAGAGGGACAGCTAAGAGAAGCTCTATTAGATACAGGAGCAGACGATACAGTATTAGAAGACATAGATTTGCCAGGAAAATGGAAACCAAGAATGATAGGGGGAATTGGAGGCTTCATCAAGGTAAAACAGTATGATCAGATATCTATAGAAATTTGTGGAAAAAGAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAACATGATGACGCAGATTGGCTGTACTTTAAATTTGGCAATTAGTCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGGCCAAAGGTTAAACAATGGCCACTGACAGCAGAAAAAATAAAAGATTGGGCCTGAAAATCCA")

This assignment can be solved in a lot of different ways — of varying complexity — so be inventive and after you have handed in your own assignment, have a look at what your friends have done to solve the problem.

Here is one approach: read the entire content of the file into a string using the read()
method of the opened file. Then you can use the split() method to split the the string
into a list of individual Fasta entries. Try using '>' as argument to the split method then and print the resulting list. Now you can use a for loop to iterate over the elements of this list (that are each strings representing a Fasta entry)  so you can produce tuple for each with the header and the sequence. Hint: use the splitlines() method to split each string (Fasta entry) into the individual lines it contains (i.e. the header line and all the sequence lines). Then you can fish out the header line from that list. To produce the sequence you need to join all the the sequence lines.  Here is some code to get you started:

fasta_file = open('input.fasta', 'r')
file_content = fasta_file.read()
list_of_entries = file_content.split(">"):
for entry in list_of_entries:
    print entry # to see what entry is

# (figure out why the first is empty and remove it before the loop)
# here you are on your own but try the splitlines method...

Still, there are many other, and better, ways to do it. You can think about a way once you are done with the large exercise this week.

 

Send the file with python code to Dan no later than Tuesday (24/11) at 12.00 (not 24.00).

Applied programming 2015 week four

Reading

Your should read/complete the exercise 39 in Learn Python the Hard Way before the Thursday lecture.

Lectures

At the Thursday lecture we will continue working with lists. At the Tuesday lecture we will talk about a special data structure called dictionaries.

Exercises  (TØ)

The exercise for this week is about comparing HIV sequences and is available as a link on the main course page in the table showing the course outline.

Mandatory assignment

This weeks assignment will walk you through some of the most common string manipulations.

A palindrome is a string that is spelled the same way backwards and forwards. Write a function, isPalindrome(s), that returns True if s is a palindrome and False
otherwise.

Example usage:

print isPalindrome('abcba')

Should print True

print isPalindrome('foo')

Should print False

One approach to this is to run through s from the first to the middle character and for
each character check if the character is equal to the character at the same index from the
right rather than the left. Remember that the first character of a string is at index 0
and the last at index -1, the second character is at index 1 and the second last at index
-2 and so forth.

Since you need to run through the string from the first to the middle character you first
need to figure out how many characters that corresponds to. Say your palindrome is
p="ACTGTCA", then the number of indexes you need to loop over with a for loop is
len(p)/2. Figure out how to make range() return indexes you can use to access the
characters in the first half of the sequence. Then make a for loop where you iterate
over the indexes you get from range(). Try to make the for loop print out the first
half of the characters.

Once you get this far you need to compare each character from the first half to the
corresponding ones starting from the other end of the palindrome. Figure out how to change
each index used for the first half to the corresponding index for the other half so you
can compare the relevant pairs. (You need to compare index 0 with -1, 1 with -2 and so on…)

Now try to make the for loop print both the character from the first half and the
corresponding character from the other end. If you got the indexes right you will see that
the A print with the A from the other end, the C with the C and so on.

Write an if statement in the for loop that tests if the two corresonding characters are the same. If your string is a palindrome all pairs are the same. So as soon as you see a
pair that is not the same, you know it is not a palindrome and you can let your function
return False like this:

if leftCharacter != rightCharacter:
return False

On the other hand, if all pairs pass this test then it is a palindrome and the function
should return True when exiting the for loop.

Word count

Write a function, wordCount(s), that counts the number of words in a string.

Example usage:

print wordCount('foo bar')
2

Useful method:

Re-formatting text

Write a function, reformat(s), that takes a string, s, and replaces all white spaces
(spaces, tabs) with a single space.

Example usage:

s = "foo    bar      baz"
print reformat(s)
foo bar baz

Useful methods:

  • split()
  • join()

go look them up – or just google “python join”.

Send the file with python code to Dan no later than Tuesday (24/11) at 12.00 (not 24.00).