Statistical analysis for sparse high-throughput sequencing


metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq implements both our novel normalization and statistical model accounting for under-sampling of microbial communities and may be applicable to other datatypes. The package includes useful visualization tools. metagenomeSeq has been available through Bioconductor since release 2.12.

Recent news:

Test the developer version >= 1.5.5x for the latest up-to-date features!

March 17, 2014 - added uniqueFeatures, plotFeature, with slight vignette updates.

March 10, 2014 - we have added functionality to aggTax/aggregateByTaxonomy and added correlationTest

March 4, 2014 - we now include biom-format support to write, read and convert between MRexperiment objects and biom-format

We will be including more information in the vignette for contrast building/testing of more than two conditions - for now check out Section 9.3 (page 42) in the limma user guide gives step by step instructions on how to make contrasts for various group testing.

Obtaining metagenomeSeq

To install and download the latest version of metagenomeSeq please visit the bioconductor page Please see How to use the developers version. Note: the bioconductor version you use depends on your version of R unless you make use of the github repo.


Thanks to recent developments at bioconductor, we will begin maintaining the github repository as the official development branch for metagenomeSeq. We are constantly updating metagenomeSeq on github and can be installed using devtools.



Detailed documentation is available in the vignette after installing the package. For instructions on using R, please see this R introduction.

After installing the package, invoking vignette("metagenomeSeq") will provide a manual for an overview of the typical metagenomic analysis.

This should produce a document similar to this.

For a list of functions available in the package and more information about parameter inputs for a particular function call:

Or take a look at the user manual.


Two sample datasets are provided within the package.

  • Lung microbiome. The lung microbiome consisted of respiratory flora sampled from six healthy individuals. Three healthy nonsmokers and three healthy smokers. The upper lung tracts were sampled by oral wash and oro-/nasopharyngeal swabs. Up to a patients’ glottis, samples were taken using two bronchoscopes, serial bronchoalveolar lavage and lower airway protected brushes. More detailed information about the lung microbiome samples, collection and protocols is available from "Topographical continuity of bacterial populations in the healthy human respiratory tract".

  • Humanized gnotobiotic mouse gut. Twelve germ-free adult male C57BL/6J mice were fed a low-fat, plant polysaccharide-rich diet. Each mouse was gavaged with healthy adult human fecal material. Following the fecal transplant, mice remained on the low-fat, plant polysacchaaride-rich diet for four weeks, following which a subset of 6 were switched to a high-fat and high-sugar diet for eight weeks. Fecal samples for each mouse went through PCR amplification of the bacterial 16S rRNA gene V2 region weekly. Details of experimental protocols and further details of the data can be found in Turnbaugh et. al. Sequences and further information can be found at:

  • Simulated datasets. Below is a link to the simulated data used within the metagenomeSeq paper for comparison of methods. metagenomeSeq simulations
  • Requests

    Any requests for features please email Joseph Paulson at jpaulson at


    For questions and comments write to jpaulson or hcorrada or mpop all at


  • Bill and Melinda Gates Foundation (42917)
  • US National Science Foundation Graduate Research Fellowship (DGE0750616)
  • US National Institutes of Health (5R01HG005220)
  • Citation

    Joseph N Paulson, O Colin Stine, Héctor Corrada Bravo, and Mihai Pop. Differential abundance analysis for microbial marker-gene surveys. Nature Methods - (2013). doi:10.1038/nmeth.2658