Statistical analysis for sparse high-throughput sequencing


metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq implements both our novel normalization and statistical model accounting for under-sampling of microbial communities and may be applicable to other datatypes. The package includes useful visualization tools. metagenomeSeq has been available through Bioconductor since release 2.12.

Recent news

Release news

Developer news

New time series method for longitudinal data and vignette available in the developer's version here. see ?fitTimeSeries

To visualize your dataset in an interactive display:

library(metagenomeSeq) # Recommended version > 1.9.9

obj = your MRexperiment dataset

Publish a visualization of your dataset through shinyapps for collaborators after modifying this script:

library(metagenomeSeq) # Version > 1.9.9
library(shinyapps) # requires shinyapps account

# Below will install developer's metagenomeSeq if having trouble
# library(devtools)
# install_github("nosson/metagenomeSeq")

# After modifying the Rmd file and placing it in a local folder

Recent metagenomeSeq study visualization of gut microbiome available here:

Obtaining metagenomeSeq

To install and download the latest version of metagenomeSeq please visit the Bioconductor page


Thanks to recent developments at Bioconductor we maintain a Github repository as the official development branch for metagenomeSeq. We are constantly updating metagenomeSeq on Github. The development branch of metagenomeSeq can be installed with the following code:



Detailed documentation is available in the vignette following installation. For instructions on using R, please see the R introduction.

After installing the package, calling vignette("metagenomeSeq") will provide a manual for an overview of the typical metagenomic analysis.

This should produce a document similar to the devel vignette or release vignette.

For a list of functions available in the package and more information about parameter inputs for a particular function call:

Or take a look at the devel user manual or the release user manual.


Two sample datasets are provided within the package.

  • Lung microbiome. The lung microbiome consisted of respiratory flora sampled from six healthy individuals. Three healthy nonsmokers and three healthy smokers. The upper lung tracts were sampled by oral wash and oro-/nasopharyngeal swabs. Up to a patients’ glottis, samples were taken using two bronchoscopes, serial bronchoalveolar lavage and lower airway protected brushes. More detailed information about the lung microbiome samples, collection and protocols is available from "Topographical continuity of bacterial populations in the healthy human respiratory tract".

  • Humanized gnotobiotic mouse gut. Twelve germ-free adult male C57BL/6J mice were fed a low-fat, plant polysaccharide-rich diet. Each mouse was gavaged with healthy adult human fecal material. Following the fecal transplant, mice remained on the low-fat, plant polysacchaaride-rich diet for four weeks, following which a subset of 6 were switched to a high-fat and high-sugar diet for eight weeks. Fecal samples for each mouse went through PCR amplification of the bacterial 16S rRNA gene V2 region weekly. Details of experimental protocols and further details of the data can be found in Turnbaugh et. al. Sequences and further information can be found at:

  • Simulated datasets. Below is a link to the simulated data used within the metagenomeSeq paper for comparison of methods. metagenomeSeq simulations
  • Requests

    Any requests for features should be addressed to Joseph Paulson at jpaulson at


    For questions and comments write to jpaulson or hcorrada or mpop all at


  • Bill and Melinda Gates Foundation (42917)
  • US National Science Foundation Graduate Research Fellowship (DGE0750616)
  • US National Institutes of Health (5R01HG005220)
  • Citation

    H Talukder, JN Paulson, HC Bravo. Finding regions of interest in high throughput genomics data using smoothing splines. Submitted

    JN Paulson, M Pop, HC Bravo. metagenomeSeq: Statistical analysis for sparse high-throughput sequncing. Bioconductor package: XX.

    Joseph N Paulson, O Colin Stine, Héctor Corrada Bravo, and Mihai Pop. Differential abundance analysis for microbial marker-gene surveys. Nature Methods - (2013). doi:10.1038/nmeth.2658