metagenomeSeq

Statistical analysis for sparse high-throughput sequencing

Overview

metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq implements both our novel normalization and statistical model accounting for under-sampling of microbial communities and may be applicable to other datatypes. The package includes useful visualization tools. metagenomeSeq has been available through Bioconductor since release 2.12.

Recent news

New interactiveDisplay method (interactive exploratory plots) available for metagenomeSeq objects in interactiveDisplay 1.3.19 and above!

New time series method for longitudinal data available in metagenomeSeq version 1.7.18 (see fitTimeSeries)

Check out some of the visualizations available through metagenomeSeq in action for a recent study: http://epiviz.cbcb.umd.edu/shiny/MSD1000/

If interested in similar pages for your own site, please let us know so we can develop tools to easily integrate your own data dynamically.

Make use of the developer version for the latest features!

Version 1.7.xx

+ Added function plotBubble

+ Added parallel (multi-core) options to fitPA, fitDO

+ Fixed bug for fitMeta when useCSSoffset=FALSE and model matrix ncol==2

+ (1.7.10) Updated default quantile estimate (.5) for low estimates

+ (1.7.10) Added short description on how to do multiple group comparisons

+ (1.7.15) Output of fitZig (eb) is now a result of limma::eBayes instead of limma::ebayes

+ (1.7.16) plotMRheatmap allows for sorting by any stat (not just sd)

+ (1.7.18) fitTimeSeries Including times series method for differentially abundant time intervals

Obtaining metagenomeSeq

To install and download the latest version of metagenomeSeq please visit the Bioconductor page http://www.bioconductor.org/packages/devel/bioc/html/metagenomeSeq.html.

source("http://bioconductor.org/biocLite.R")
biocLite("metagenomeSeq")

Thanks to recent developments at Bioconductor we maintain a Github repository as the official development branch for metagenomeSeq. We are constantly updating metagenomeSeq on Github. The development branch of metagenomeSeq can be installed with the following code:

source("http://bioconductor.org/biocLite.R")
useDevel()
biocLite("metagenomeSeq")

Documentation

Detailed documentation is available in the vignette following installation. For instructions on using R, please see the R introduction.

After installing the package, calling vignette("metagenomeSeq") will provide a manual for an overview of the typical metagenomic analysis.

library("metagenomeSeq")
vignette("metagenomeSeq")
This should produce a document similar to the devel vignette or release vignette.

For a list of functions available in the package and more information about parameter inputs for a particular function call:

help(package=metagenomeSeq)
?function
Or take a look at the devel user manual or the release user manual.

Datasets

Two sample datasets are provided within the package.

  • Lung microbiome. The lung microbiome consisted of respiratory flora sampled from six healthy individuals. Three healthy nonsmokers and three healthy smokers. The upper lung tracts were sampled by oral wash and oro-/nasopharyngeal swabs. Up to a patients’ glottis, samples were taken using two bronchoscopes, serial bronchoalveolar lavage and lower airway protected brushes. More detailed information about the lung microbiome samples, collection and protocols is available from "Topographical continuity of bacterial populations in the healthy human respiratory tract".

  • Humanized gnotobiotic mouse gut. Twelve germ-free adult male C57BL/6J mice were fed a low-fat, plant polysaccharide-rich diet. Each mouse was gavaged with healthy adult human fecal material. Following the fecal transplant, mice remained on the low-fat, plant polysacchaaride-rich diet for four weeks, following which a subset of 6 were switched to a high-fat and high-sugar diet for eight weeks. Fecal samples for each mouse went through PCR amplification of the bacterial 16S rRNA gene V2 region weekly. Details of experimental protocols and further details of the data can be found in Turnbaugh et. al. Sequences and further information can be found at: http://gordonlab.wustl.edu/TurnbaughSE_10_09/STM_2009.html.

  • Simulated datasets. Below is a link to the simulated data used within the metagenomeSeq paper for comparison of methods. metagenomeSeq simulations
  • Requests

    Any requests for features should be addressed to Joseph Paulson at jpaulson at umiacs.umd.edu

    Contact

    For questions and comments write to jpaulson or hcorrada or mpop all at umiacs.umd.edu

    Funding

  • Bill and Melinda Gates Foundation (42917)
  • US National Science Foundation Graduate Research Fellowship (DGE0750616)
  • US National Institutes of Health (5R01HG005220)
  • Citation

    H Talukder, JN Paulson, HC Bravo. Finding regions of interest in high throughput genomics data using smoothing splines. Submitted

    JN Paulson, M Pop, HC Bravo. metagenomeSeq: Statistical analysis for sparse high-throughput sequncing. Bioconductor package: XX. http://cbcb.umd.edu/software/metagenomeSeq

    Joseph N Paulson, O Colin Stine, Héctor Corrada Bravo, and Mihai Pop. Differential abundance analysis for microbial marker-gene surveys. Nature Methods - (2013). doi:10.1038/nmeth.2658