SYLLABUS
CMSC828N:
Computational
Gene Finding and Genome Assembly
Tuesdays and Thursdays,
3:30-4:45pm,
Room 3118 Biomolecular Sciences Building
Professor: Steven Salzberg, 3125 Biomolecular Sciences Building,
salzberg (at) umiacs.umd.edu
Office hours: By appointment.
Textbook: Computational Gene
Prediction (CGP) by William H. Majoros
Supplemental texts, free online at the NCBI
Bookshelf (click title to view):
Molecular
Biology of the Cell, by Bruce
Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts,
and Peter Walter. Garland Publishing, 2002.
Genomes,
by T.A. Brown, BIOS Scientific Publishers, 2002.
Note: additional links to lecture notes and assignments will appear on
the syllabus as the semester progresses
Day 1: Thursday, Sept 30
Introduction to the course. Molecular biology background.
Reading: Chapter
1, The Human Genome, in Genomes,
by T.A. Brown, free at the NCBI Bookshelf.
Lecture slides
from Aug 30.
Week
1: Sept 4-6
Biotechnology
background on
sequencing, assembly.
Whole-genome shotgun
sequencing. Pairwise sequence
alignment. Basic assembly: shortest
common superstring, greedy assembly algorithms.
Reading: (a) Chapter
6, "Sequencing Genomes, in Genomes,
by T.A. Brown, free at the NCBI Bookshelf. (b) Gene Myers' 1999
intro paper on whole-genome sequencing.
Lecture slides from
Sept 4 and 6
Alignment slides
from Sept 6 (slides by Art Delcher)
Week 2: Sept 11-13
The Celera Assembler algorithm..
Hash indexing for overlap
computation. Screening
repeats.
Reading: (1) Myers, The Fragment
Assembly String Graph, Bioinformatics
21 (2005); (2) The phrap assembler documentation, http://www.phrap.org/phredphrap/phrap.html.
Get lab 1 here, due on Sept
27.
Slides
from Sept 11, assembly intro
Week 3: Sept 18-20
NOTE: this week the 15th Annual
Microbial Genomics Conference will be in College Park.
Topics: The Celera Assembler and the Arachne assembler algorithms.
Reading: Myers et
al, A
Whole-Genome Assembly of Drosophila, Science 287 (2000).
Celera
assembler slides
Arachne lecture notes
Week 4: Sept 25-27
Lab 1 due Sept 27.
Arachne continued. Using MUMmer
for assembly alignment and comparison.
Readings:
S. Batzoglou et al., ARACHNE: A
whole-genome shotgun assembler, Genome Research 12,
Issue 1, 177-189, January 2002.
A.L. Delcher et al., Alignment of
Whole Genomes Nucleic Acids Research,
27:11 (1999), 2369-2376. Note that Figure
6 is
supposed to be in color, and was mistakenly printed as black and
white.
Readings for class presentations: choose from
this list or use Wentian
Li's bibliography
page for more choices.
Lecture
notes on genome closure and finishing
Lecture
notes on the Figaro trimming algorithm (by James White)
Lecture notes
on AutoEditor
Week 5: Oct 2-4
Multiple genome alignment with
MUMmer. Comparative assembly.
Get Lab 2 (the Project) here
Slides
from
Adam Phillippy's MUMmer lecture
Slides from Oct 4.
Week 6: Oct 9-11
Class presentations on
selected readings.
Reading: Tettelin et al., Optimized
Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing
Project. Genomics
62 (1999), 500-507.
Week 7: Oct 16-18
Additional assembly topics: debugging
assemblies with Hawkeye, the assembly viewing tool.
Scaffolding with Bambus. Introduction to computational gene finding topics.
Reading: Chapters 1-2 of CGP,
Introduction" and "Mathematical
preliminaries".
Slides on
Hawkeye
Slides on Bambus scaffolder
Introduction to
gene finding slides
Week 8: Oct 23-25
Bacterial gene finding. Markov
chains. Case study: the Glimmer
gene
finder.
Reading: CGP,
"Overview of Computational Gene Prediction," Chapter 3. Also:
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene
identification using interpolated Markov models, Nucleic Acids
Research 26:2 (1998), 544-548.
Lecture slides
on Markov chains
Lecture slides
on Glimmer and bacterial gene finding.
Week 9: Oct 30-Nov 1
Transcription terminator prediction in bacterial
genomes. Introduction to HMMs for eukaryotic gene finding
and the
Forward algorithm.
Reading: CGP, "Signal and Content Sensors"
chapter 7.
Lab 2 due
Nov. 1.
Lecture
notes on Transterm
Lecture on the
Forward algorithm for HMMs.
Lecture on Backward
and E-M algorithms for HMMs.
Week 10: Nov 6-8
Nov 6: Class presentations on selected readings. Nov 8:
designing HMMs for gene finding.
Bill
Majoros' slides on HMM design for gene finding.
Get
Lab 3 here, due Nov. 20.
Reading: CGP, "Toy Exon
Finder"
chapter 5.
Week 11: Nov 13-15
Case study: GlimmerHMM.
Generalized HMM algorithms.
Ela Pertea's
notes on GlimmerHMM.
Reading: CGP, "Hidden Markov Models"
chapter 6; and "Generalized HMMs"
chapter 8.
Week 12: Nov 20 (Nov 22 is Thanksgiving)
The
Toyscan algorithm from the textbook. Gene finding in humans: the EGASP
competition.
Lab 3 due today!
Get Lab 4 instructions here. The test data for Lab 4 is here.
The main files for Lab4
are here.
EGASP slides, part 1
(M. Reese) and part
2 (P. Flicek).
Week 13: Nov 27-29
Combining multiple gene finders with
JIGSAW. Exon splicing enhancers, alternative
splicing.
Reading: (1)
the
JIGSAW paper.
(2) CGP,
"Signal and Content
Sensors", chapter 7, section 7.3 to end of chapter.
Lecture
notes on
GeneSplicer and Combiner.
Lecture notes on
JIGSAW
Lecture notes
on exon splicing enhancers
Week 14: Dec 4-6
Gene finding with conditional random
fields. The status of
the human genome: assembly and annotation. Next-generation
sequencing technology.
Last
class: Dec 11.
Lab 4 due Dec 11. Take home exams
distributed Dec 11, due Dec 18.