Munch Lab

Mix yourself a baboon

Just as when a new cocktail is invented by mixing old ingredients, evolution sometimes makes a brand new species by mixing old ones. One such remarkable example is the Kinda baboon (picture) that inhabits large parts of Angola, Zimbabwe, and Dem. rep. Congo. Two ancestral baboon species had followed separate evolutionary paths for about one and a half million years until they met only 150 thousand years ago and fused to create the Kinda baboon. This surprising finding is part of a study that we just published in Science Advances, as part of a large international collaboration with important contributions from the Mailund and Schierup labs at BiRC.

Read more…

Elephants and the mesh of life

The evolution and diversification of life are nicely depicted by a tree – with the original first living organism at the root and the main branches splitting into still smaller branches and twigs that eventually lead to the living species at the leaves. But what about all the species that went extinct, you may ask. The dinosaurs found themselves in dire straits 65 million years ago. In fact, species go extinct all the time. Whereas recently emerged species most likely persist, species that arose hundreds of million years ago have most likely gone extinct since then. This continuously removes twigs and branches from the tree of life, which is why it has ended up looking like, well, a tree.

Read more…

Selective sweeps across twenty million years of primate evolution

Our paper in MBE shows how patterns of incomplete lineage sorting measures linked selection on evolutionary timescales. We show that a large portion of linked selection is due to selective sweeps and that the human-chimpanzee ancestor experienced a substantially higher frequency of sweeps than did the human-orangutang ancestor. These ancestral sweeps are enriched for sweeps in modern humans suggesting that several regions of the genome are repeatedly hit by sweeps.

Link to paper

Applied programming 2015 week seven

Reading

None. Keep revisiting what we have been through at lectures and in the online book.

Lectures

The Thursday lecture is canceled.  At the Tuesday lecture we will treat some of the issues you find most difficult, we will evaluate the course and talk about the exam.

Exercises  (TØ)

The exercise for this week is about assembling genome sequence from sequencing reads. You can find a link to the exercise in the outline table on the course page.

Mandatory assignment

This is the last week so there is no assignment.

Applied programming 2015 week five

Reading

There is no reading for this week. Spend the extra time looking back on the chapters slides and exercises we have covered so far.

Lectures

At the Thursday lecture I will talk more about dictionaries.  At the Tuesday lecture we will talk about how to put all the things together that we have learned so far, and a bit about modules.

Exercises  (TØ)

The exercise for this week is about analyzing the base composition of HIV sequences and. It is available as a link on the main course page in the table showing the course outline.

Mandatory assignment

Write a function,  parse_fasta(filename),  that should read the file named filename. This file should contain multiple sequence entries in Fasta format and the function must parse this data and return a list of tuples of the form (header, sequence). Download this file and use as input. The first entry in the file is:

>numberOne this is sequence one
AGTTTCCCTCAAATCACTCTTTGGCAACGACCCATCGTCACAGTAAGAAT
AGAGGGACAGCTAAGAGAAGCTCTATTAGATACAGGAGCAGACGATACAG
TATTAGAAGACATAGATTTGCCAGGAAAATGGAAACCAAGAATGATAGGG
GGAATTGGAGGCTTCATCAAGGTAAAACAGTATGATCAGATATCTATAGA
AATTTGTGGAAAAAGAGCTATAGGTACAGTATTAGTAGGACCTACACCTG
TCAACATAATTGGAAGAAACATGATGACGCAGATTGGCTGTACTTTAAAT
TTGGCAATTAGTCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAAT
GGATGGGCCAAAGGTTAAACAATGGCCACTGACAGCAGAAAAAATAAAAG
ATTGGGCCTGAAAATCCA

so this tuple should be the first one in your list:

("numberOne this is sequence one", "AGTTTCCCTCAAATCACTCTTTGGCAACGACCCATCGTCACAGTAAGAATAGAGGGACAGCTAAGAGAAGCTCTATTAGATACAGGAGCAGACGATACAGTATTAGAAGACATAGATTTGCCAGGAAAATGGAAACCAAGAATGATAGGGGGAATTGGAGGCTTCATCAAGGTAAAACAGTATGATCAGATATCTATAGAAATTTGTGGAAAAAGAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAACATGATGACGCAGATTGGCTGTACTTTAAATTTGGCAATTAGTCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGGCCAAAGGTTAAACAATGGCCACTGACAGCAGAAAAAATAAAAGATTGGGCCTGAAAATCCA")

This assignment can be solved in a lot of different ways — of varying complexity — so be inventive and after you have handed in your own assignment, have a look at what your friends have done to solve the problem.

Here is one approach: read the entire content of the file into a string using the read()
method of the opened file. Then you can use the split() method to split the the string
into a list of individual Fasta entries. Try using '>' as argument to the split method then and print the resulting list. Now you can use a for loop to iterate over the elements of this list (that are each strings representing a Fasta entry)  so you can produce tuple for each with the header and the sequence. Hint: use the splitlines() method to split each string (Fasta entry) into the individual lines it contains (i.e. the header line and all the sequence lines). Then you can fish out the header line from that list. To produce the sequence you need to join all the the sequence lines.  Here is some code to get you started:

fasta_file = open('input.fasta', 'r')
file_content = fasta_file.read()
list_of_entries = file_content.split(">"):
for entry in list_of_entries:
    print entry # to see what entry is

# (figure out why the first is empty and remove it before the loop)
# here you are on your own but try the splitlines method...

Still, there are many other, and better, ways to do it. You can think about a way once you are done with the large exercise this week.

 

Send the file with python code to Dan no later than Tuesday (24/11) at 12.00 (not 24.00).