CMSC 828H: Computational Gene Finding and Genome Assembly

Syllabus, Fall 2010

Course meeting time: Tuesdays and Thursdays, 2:00-3:45pm, Room 3118 Biomolecular Sciences Building
Professor: Steven Salzberg, 3125 Biomolecular Sciences Building, salzberg (at)
Office hours: By appointment.
Textbook: Computational Gene Prediction (CGP) by William H. Majoros (buy it from Amazon)

Supplemental texts,
free online at the NCBI Bookshelf:
Molecular Biology of the Cell, b
y Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter.   Garland Publishing, 2002.
Genomes, by T.A. Brown, BIOS Scientific Publishers, 2002.

Note: additional links to assignments and supplementary material will appear on the syllabus as the semester progresses

Week 1: Aug 31-Sept 2
Introduction to the course.  Molecular biology background. 
Biotechnology background on sequencing and assembly.  Basic pairwise sequence alignment
Lecture notes for lecture 2 (Thursday)

Reading: Chapter 1, The Human Genome, in
Genomes, by T.A. Brown, free at the NCBI Bookshelf.

Week 2: Sept 7-9
Whole-genome shotgun sequencingGenome sequencing technology.  Basic assembly: shortest common superstring, greedy assembly algorithms.  Problems caused by repetitive DNA.
Lcture notes for lecture 3 (Tuesday)

Get Lab 1 here.

Reading: (1) Chapter 6, "Sequencing Genomes, in Genomes, by T.A. Brown, free at the NCBI Bookshelf.  (2) Gene Myers' 1999 intro paper on whole-genome sequencing.

Week 3: Sept 14-16
The Celera Assembler algorithm.  Error correction with AutoEditor.
Lecture slides for Celera Assembler, AutoEditor, and SNP overview.

Reading:  (1)
Myers, The Fragment Assembly String Graph, Bioinformatics 21 (2005); (2) The Minimus assembler documentation,

Week 4: Sept 21-23
The Arachne assembler algorithm.  Comparative assembly with AMOScmp.

Lab 1 due Friday, Sept 24. 

Readings: Myers et al, A Whole-Genome Assembly of Drosophila, Science 287 (2000).
S. Batzoglou et al., ARACHNE: A whole-genome shotgun assembler,  Genome Research 12:1 (2002), 177-189.

Week 5: Sept 28-30

Trimming with Figaro.  Multiplex PCR for closing gaps.  Using MUMmer for assembly alignment and comparison.

Get Lab 2 here: lab02.txt and lab02.afg.

Lecture notes on MUMmer.

Readings: A.L. Delcher et al.,  Alignment of Whole Genomes   Nucleic Acids Research, 27:11 (1999), 2369-2376.  Tettelin et al., Optimized Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing Project.  Genomics 62 (1999), 500-507.

Week 6: Oct 5-7
Assembly debugging with Hawkeye.  Short read sequencing using 454 and Illumina technology.
Guest lecture by David Kelley on Oct. 7:
error correction with Quake.

Week 7: Oct 12-14
No class Oct. 12.  Student presentations Oct 14.

Lab 2 due Friday, Oct 15.

Week 8: Oct 19-21
Short-read assembly with de Bruijn graphs.  The Velvet assembler.  Introduction to computational gene finding topics.

Lecture notes on de Bruijn assembly (most slides courtesy of Mike Schatz)
Get Lab 3 here.

Pevzner PA, Tang H, Waterman MS, An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 2001 Aug 14; 98(17):9748-53.
Zerbino, D. and E. Birney.  Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18: 821-829.
Chapters 1-2 of CGP, Introduction" and "Mathematical preliminaries".  See the textbook website for additional PowerPoint slides.

Week 9: Oct 26-28
Bacterial gene finding.  Markov chains.  Case study: the Glimmer gene finder. 

CGP, "Overview of Computational Gene Prediction," Chapter 3.  Also: S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models, Nucleic Acids Research 26:2 (1998), 544-548.

Week 10: Nov 2-4
Overlapping genes in bacteria.  Eukaryotic gene finding: introduction to HMMs and the Forward algorithm.
Lab 3 due Friday, Nov 5.

Reading: CGP, "Signal and Content Sensors" chapter 7.
Lecture notes on HMMs: lecture1 and lecture2

Week 11: Nov 9-11
Student presentations on Nov. 9.
Nov 11: Guest lecture by Mihaela Pertea.

Reading: CGP, "Toy Exon Finder" chapter 5.
Lecture notes on GHMMs from Mihaela Pertea.

Week 12: Nov 16-18

Get Lab 4 here. This is the mini-project, due on Dec. 9.

Topics: Explanation of lab4. Signal recognition: splice sites and exon splicing enhancers
.   Time permitting: ancient DNA introduction. 

Nov. 18: Special lecture in CBG seminar series, 1103 Biosciences Research Building, by
M. Thomas P. Gilbert, Centre for Ancient Genetics, University of Copenhagen.  Title: "Palaeogenomics - challenges faced, progress made and future prospects."

Reading: CGP, "Hidden Markov Models" chapter 6; and
"Generalized HMMs" chapter 8.

Week 13: Nov 23 (Nov 25 is Thanksgiving)
Topic TBA.

Week 14: Nov 30-Dec 2
Gene finding in humans: the EGASP and NGASP competitions.  Gene finding with conditional random fields (CRFs).

Reading: (1) the JIGSAW paper.  (2) CGP, "Signal and Content Sensors", chapter 7, section 7.3 to end of chapter.

Week 15: Dec 7-9 (last week)
Pair HMMs.  The status of the human genome: assembly and annotation.

Lab 4 due Dec 9.  Take home exams distributed Dec 9, due Dec 15.

GRADING: The first three labs count for 15% of the grade each, the fourth lab counts for 25%, the class presentation counts for 5%, and the final exam counts for 25%.