Course Information

Course Calendar

Date Lecture Name Readings (recommended) Work Due
1/28 Course introduction and administrivia
1/30 The R data analysis environment John Cook's intro
Baltimore Analysis
The source
2/4 Molecular biology for computer scientists and statisticians Hunter. Molecular Biology for Computer Scientists
2/6 The R/Bioconductor genomics analysis environment Bioconductor paper
The anatomy of successful computational biology software
The setup script: setup.R
The script we're using in class: bioconductor.R
Microarray analysis example
2/11 Genome architecture
Notes on HMMs for CpG Islands
Wu, et al., 2010
GenomicRanges Paper
2/13 CAMPUS CLOSED
2/18 Overview of second generation sequencing technology
Bowtie

Bowtie
2/20 Gene expression analysis: RNA sequencing analysis DESeq
RNAseq review article
Myrna
2/25 RNA sequencing analysis (II)
Lecture Notes
DESeq
RNAseq review article
Myrna
2/27 Isoform expression quantification and transcriptome assembly Jiang and Wong
IsoLasso
Cufflinks
Salzman, Jiang and Wong
3/4 Isoform expression quantification and transcriptome assembly Jiang and Wong
IsoLasso
Cufflinks
Salzman, Jiang and Wong
3/6 Isoform expression quantification and transcriptome assembly
Lessons learned from RNA-seq
Jiang and Wong
IsoLasso
Cufflinks
Salzman, Jiang and Wong
3/11 Data Analyst Bag of Tricks I: Empirical Bayes Methods
LectureNotes
limma
[1] Ch. 11 and Ch. 14
3/13 Data Analyst Bag of Tricks II: Multiple Testing
Lecture Notes
q- value
Noble, "How does multiple testing correction work"
SAM

HW1

3/18 No class: Spring Break
3/20 No class: Spring Break
3/25 Unsupervised methods
Notes on EM
[1] Chs. 12 and 13
3/27 Unsupervised methods (II) [2] Ch. 14.3 and Ch 14.5
SVA
Leek, et al., batch effects
3/27 Unsupervised methods (II) [2] Ch. 14.3 and Ch 14.5
SVA
Leek, et al., batch effects

HW2

4/3 Classification and prediction methods [2] Ch. 4
4/4 THIS IS NOT A LECTURE DATE

Project Proposal

4/8 Recap
4/10 Genetics:Brief genotyping intro
Genotype/phenotype association discovery and analysis
SOAP
RAPID
Lirnet
4/14 THIS IS NOT A LECTURE DATE

Midterm

4/15 Group presentations
4/17 Group presentations
4/22 Genetics:Genotype/phenotype association discovery and analysis RAPID
4/24 Regulatory network discovery Segal, et al., 2003
4/29 Analysis of differential methylation with sequencing Hansen et al., 2012
BSmooth
5/1 Approaching the promise of individualized medicine
5/2 THIS IS NOT A LECTURE DATE

Project progress report

5/3 THIS IS NOT A LECTURE DATE
5/6 Project presentations (1)
5/8 Project presentations (2)

HW 3

5/13 Project presentations (3)
5/15 THIS IS NOT A LECTURE DATE Final project

Lectures linked are from last semester and very likely to change near lecture time

Legend
Under construction
Not updated yet

[1] Gentleman, R., Carey, V.J., et al. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, 2005.
[2] Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. Springer 2009.

Many slides are borrowed from a number of sources (hopefully cited in slides). A lot of them are borrowed from Rafael A. Irizarry.

Homeworks

Homework Date posted Due date Hints and solutions
Homework 1 Feb 15 Mar 13
Homework 2 Mar 25 Apr 1
Homework 3 May 1 May 9

Resources

R Resources

Other Resources

Syllabus

The official syllabus detailing class policies, calendar and other details can be found here [pdf]

Description

Major advances in technology for genomic studies are bringing the prospect of personalized and individualized medicine closer to reality. Many of these advances are predicated on the ability to generate data at an unprecedented rate, posing a significant need for computational data analysis that is clinically and biologically useful and robust.

This course will concentrate on the fundamental computational and statistical methods required to meet this need. It will cover topics in functional genomics, population genetics and epigenetics. Computational methods studied for this type of analysis include: supervised, unsupervised and semi-supervised learning, data visualization, statistical modeling and inference, probabilistic graphical models, sparse methods, and numerical optimization. Machine learning methods will be a core component of this class. No prior knowledge of biology is required.

Topics to be covered (not an exhaustive list)