SYLLABUS

CMSC828N: Computational Gene Finding and Genome Assembly


Tuesdays and Thursdays, 3:30-4:45pm, Room 3118 Biomolecular Sciences Building

Professor: Steven Salzberg, 3125 Biomolecular Sciences Building, salzberg (at) umiacs.umd.edu
Office hours: By appointment.
Textbook: Computational Gene Prediction (CGP) by William H. Majoros


Supplemental texts,
free online at the NCBI Bookshelf (click title to view):
Molecular Biology of the Cell, b
y Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter.   Garland Publishing, 2002.
Genomes, by T.A. Brown, BIOS Scientific Publishers, 2002.

Note: additional links to lecture notes and assignments will appear on the syllabus as the semester progresses

Day 1: Thursday, Sept 30
Introduction to the course.  Molecular biology background.

Reading: Chapter 1, The Human Genome, in
Genomes, by T.A. Brown, free at the NCBI Bookshelf.
Lecture slides from Aug 30.

Week 1: Sept 4-6
Biotechnology background on sequencing, assembly.  Whole-genome shotgun sequencing.  Pairwise sequence alignment. Basic assembly: shortest common superstring, greedy assembly algorithms.

Reading: (a) Chapter 6, "Sequencing Genomes, in Genomes, by T.A. Brown, free at the NCBI Bookshelf.  (b) Gene Myers' 1999 intro paper on whole-genome sequencing.

Lecture slides from Sept 4 and 6
Alignment slides from Sept 6 (slides by Art Delcher)

Week 2: Sept 11-13
The Celera Assembler algorithm..  Hash indexing for overlap computation.  Screening repeats.

Reading:  (1)
Myers, The Fragment Assembly String Graph, Bioinformatics 21 (2005); (2) The phrap assembler documentation, http://www.phrap.org/phredphrap/phrap.html.

Get lab 1 here, due on Sept 27.

Slides from Sept 11, assembly intro


Week 3: Sept 18-20
NOTE: this week the 15th Annual Microbial Genomics Conference will be in College Park.
Topics: The Celera Assembler and the Arachne assembler algorithms.


Reading: Myers et al, A Whole-Genome Assembly of Drosophila, Science 287 (2000).

Celera assembler slides

Arachne lecture notes

Week 4: Sept 25-27
Lab 1 due Sept 27.
Arachne continued.  Using MUMmer for assembly alignment and comparison.

Readings:
    S. Batzoglou et al., ARACHNE: A whole-genome shotgun assembler,  Genome Research
12, Issue 1, 177-189, January 2002.
    A.L. Delcher et al.,  Alignment of Whole Genomes   Nucleic Acids Research, 27:11 (1999), 2369-2376.  Note that Figure 6 is supposed to be in color, and was mistakenly printed as black and white.

Readings for class presentations: choose from this list or use Wentian Li's bibliography page for more choices.

Lecture notes on genome closure and finishing
Lecture notes on the Figaro trimming algorithm (by James White)
Lecture notes on AutoEditor

Week 5: Oct 2-4
Multiple genome alignment with MUMmer.  Comparative assembly. 

Get Lab 2 (the Project) here

Slides from Adam Phillippy's MUMmer lecture
Slides from Oct 4.

Week 6: Oct 9-11
Class presentations on selected readings.

Reading: Tettelin et al., Optimized Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing Project.  Genomics 62 (1999), 500-507.

Week 7: Oct 16-18

Additional assembly topics: debugging assemblies with Hawkeye, the assembly viewing tool.  Scaffolding with Bambus. Introduction to computational gene finding topics.

Reading: Chapters 1-2 of CGP, Introduction" and "Mathematical preliminaries".

Slides on Hawkeye
Slides on Bambus scaffolder
Introduction to gene finding slides

Week 8: Oct 23-25
Bacterial gene finding.  Markov chains.  Case study: the Glimmer gene finder.

Reading:
CGP, "Overview of Computational Gene Prediction," Chapter 3.  Also: S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models, Nucleic Acids Research 26:2 (1998), 544-548.

Lecture slides on Markov chains

Lecture slides on Glimmer and bacterial gene finding.

Week 9: Oct 30-Nov 1
Transcription terminator prediction in bacterial genomes.  Introduction to HMMs  for eukaryotic gene finding and the Forward algorithm.

Reading: CGP, "Signal and Content Sensors" chapter 7.
Lab 2 due Nov. 1.

Lecture notes on Transterm
Lecture on the Forward algorithm for HMMs.
Lecture on Backward and E-M algorithms for HMMs.


Week 10: Nov 6-8
Nov 6: Class presentations on selected readings.  Nov 8: designing HMMs for gene finding.
Bill Majoros' slides on HMM design for gene finding.

Get Lab 3 here, due Nov. 20.

Reading: CGP, "Toy Exon Finder" chapter 5.

Week 11: Nov 13-15
Case study: GlimmerHMM.  Generalized HMM algorithms. 

Ela Pertea's notes on GlimmerHMM.

Reading: CGP, "Hidden Markov Models" chapter 6; and "Generalized HMMs" chapter 8.

Week 12: Nov 20 (Nov 22 is Thanksgiving)
The Toyscan algorithm from the textbook. Gene finding in humans: the EGASP competition.

Lab 3 due today!  Get Lab 4 instructions here.  The test data for Lab 4 is here.  The main files for Lab4 are here.

EGASP slides, part 1 (M. Reese) and part 2 (P. Flicek).

Week 13: Nov 27-29
Combining multiple gene finders with JIGSAW.  Exon splicing enhancers, alternative splicing.

Reading: (1) the JIGSAW paper.  (2) CGP, "Signal and Content Sensors", chapter 7, section 7.3 to end of chapter.

Lecture notes on GeneSplicer and Combiner.
Lecture notes on JIGSAW
Lecture notes on exon splicing enhancers

Week 14: Dec 4-6
Gene finding with conditional random fields. The status of the human genome: assembly and annotation.  Next-generation sequencing technology.

Last class: Dec 11.
Lab 4 due Dec 11.  Take home exams distributed Dec 11, due Dec 18.