Gene Finding and Genome Assembly
Course meeting time: Tuesdays and
Room 3118 Biomolecular Sciences Building
Professor: Steven Salzberg, 3125
Biomolecular Sciences Building,
salzberg (at) umiacs.umd.edu
Office hours: By appointment.
Textbook: Computational Gene
Prediction (CGP) by William H. Majoros (buy
Supplemental texts, free online at the NCBI
Cell, by Bruce
Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts,
and Peter Walter. Garland Publishing, 2002.
Note: additional links to assignments and supplementary
material will appear on
the syllabus as the semester progresses
Week 1: Aug 31-Sept 2
Introduction to the course. Molecular biology
sequencing and assembly. Basic pairwise
Genome, in Genomes,
by T.A. Brown, free at the NCBI Bookshelf.
Week 2: Sept 7-9
sequencing. Genome sequencing technology. Basic assembly: shortest
common superstring, greedy assembly algorithms. Problems caused
by repetitive DNA.
Reading: (1) Chapter
Genomes, in Genomes,
T.A. Brown, free at the NCBI Bookshelf. (2) Gene Myers' 1999
intro paper on whole-genome sequencing.
Week 3: Sept 14-16
The Celera Assembler algorithm.
correction with AutoEditor.
Lecture slides for Celera
Reading: (1) Myers, The Fragment
Assembly String Graph, Bioinformatics
21 (2005); (2) The Minimus assembler documentation, http://sourceforge.net/apps/mediawiki/amos/index.php?title=Minimus.
4: Sept 21-23
The Arachne assembler algorithm.
Comparative assembly with AMOScmp.
Lab 1 due
Readings: Myers et
Whole-Genome Assembly of Drosophila, Science 287 (2000).
Batzoglou et al., ARACHNE: A
whole-genome shotgun assembler, Genome Research 12:1
Week 5: Sept 28-30
Trimming with Figaro. Multiplex PCR for
closing gaps. Using
for assembly alignment and comparison.
Get Lab 2 here: lab02.txt
notes on MUMmer.
Readings: A.L. Delcher
et al., Alignment
Genomes Nucleic Acids Research,
27:11 (1999), 2369-2376. Tettelin et al., Optimized
Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing
62 (1999), 500-507.
6: Oct 5-7
debugging with Hawkeye. Short
read sequencing using 454 and Illumina technology.
Guest lecture by David Kelley on Oct. 7: error
Week 7: Oct 12-14
No class Oct. 12. Student presentations Oct 14.
Lab 2 due Friday,
Week 8: Oct 19-21
Short-read assembly with de Bruijn graphs.
The Velvet assembler. Introduction to computational gene
assembly (most slides courtesy of Mike Schatz)
Get Lab 3 here.
Tang H, Waterman MS, An
assembly. Proc. Natl. Acad.
Sci. USA 2001 Aug 14; 98(17):9748-53.
Zerbino, D. and
E. Birney. Velvet: Algorithms for
de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18: 821-829.
Chapters 1-2 of CGP,
Introduction" and "Mathematical
preliminaries". See the textbook
website for additional PowerPoint slides.
Week 9: Oct 26-28
Bacterial gene finding. Markov
chains. Case study: the Glimmer
"Overview of Computational Gene Prediction," Chapter 3. Also:
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene
identification using interpolated Markov models, Nucleic Acids
Research 26:2 (1998), 544-548.
10: Nov 2-4
Overlapping genes in bacteria. Eukaryotic gene
finding: introduction to HMMs and the
Lab 3 due Friday, Nov 5.
Reading: CGP, "Signal and
Lecture notes on HMMs: lecture1 and lecture2
Week 11: Nov 9-11
Student presentations on Nov. 9.
Reading: CGP, "Toy Exon
Lecture notes on GHMMs
from Mihaela Pertea.
Week 12: Nov 16-18
Get Lab 4 here.
mini-project, due on Dec. 9.
Topics: Explanation of lab4. Signal recognition: splice sites and exon
splicing enhancers. Time permitting:
ancient DNA introduction.
Nov. 18: Special lecture in CBG
seminar series, 1103 Biosciences Research Building, by
M. Thomas P. Gilbert, Centre
for Ancient Genetics, University of Copenhagen. Title:
"Palaeogenomics - challenges faced, progress made and future prospects."
Reading: CGP, "Hidden Markov Models" chapter 6; and "Generalized HMMs" chapter 8.
Week 13: Nov 23 (Nov 25 is
Week 14: Nov 30-Dec 2
finding in humans: the EGASP and NGASP
competitions. Gene finding with
conditional random fields (CRFs).
"Signal and Content
Sensors", chapter 7, section 7.3 to end of chapter.
Week 15: Dec 7-9 (last week)
Pair HMMs. The status of
the human genome: assembly and annotation.
Lab 4 due Dec 9.
GRADING: The first three labs
count for 15% of the grade each, the fourth lab counts for 25%, the
class presentation counts for 5%, and the final exam counts for 25%.