A major contribution of researchers in the CBCB are open-source software packages made freely available to the scientific community. The software described below are actively being developed and maintained. For software that is no longer being maintained in the Center (many of the packages are currently maintained by our alumni) please see the Inactive Software page.
Select one or more categories

The is a set of tools, libraries, and freestanding genome assemblers, all open source. AMOS is also an open consortium that includes TIGR, the University of Maryland, The Karolinska Institutet, and the Marine Biological Laboratory.

is a comparative genome assembler, which uses one genome as a reference on which to assemble another, closely related species. See the journal paper here.

(New in early 2009) Antibiotic Resistance Genes Database

Bambus 2.0, the second generation Bambus scaffolder available as an open source package. While most other scaffolders are closely tied to a specific assembly program, Bambus accepts the output from most current assemblers and provides the user with great flexibility in choosing the scaffolding parameters. In particular, Bambus is able to accept contig linking data other than specified by mate-pairs. Such sources of information include alignment to a reference genome (Bambus can directly use the output of MUMmer), physical mapping data, or information about gene synteny.

An ultrafast, memory-efficient short read aligner that aligns short DNA sequences to the human genome at a rate of about 25 million reads per hour on a typical workstation with 2 GB of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.1 GB for the human genome.

Steven Salzberg has been nominated for the 2013 Benjamin Franklin Award in the Life Sciences. This is a humanitarian/bioethics award presented to an individual who has, in his or her practice, promoted free and open access to the materials and methods used in the life sciences. More information on the award can be found at

A whole genome assembler originally developed at Celera Genomics for the assembly of the human genome. CeleraAssembler is now an open-source project at SourceForge. The code is actively maintained by researchers at CBCB and the Venter Institute (formerly known as TIGR, The Institute for Genomic Research).

(New in July 2010) DNACLUST is a tool for clustering millions of short DNA sequences. DNACLUST is free software.

Epiviz is an interactive visualization tool for functional genomics data. It supports genome navigation like other genome browsers, but allows multiple visualizations of data within genomic regions using scatterplots, heatmaps and other user-supplied visualizations. It also includes data from the Gene Expression Barcode project for transcriptome visualization. It has a flexible plugin framework so users can add d3 visualizations. You can find more information about Epiviz at and see a video tour here.

The Epivizr Bioconductor package implements two-way
communication between the R/Bioconductor computational genomics environment and EpiViz. Objects in an R session
can be displayed as tracks or plots on EpiViz. Epivizr uses WebSockets for communication
between the browser JavaScript client and the R environment.

Bayesian tool to integrate genetic and epigenetic data to find causal expression regulatory polymorphisms


GOAL is R implement eQTeL model, that integrate Genetic and Epigenetic data to identify SNPs that regulates expression of genes. More specifically, it leverages epigenetic data to estimate a) regulatory potential, and b) interaction potential to identify SNPs that are responsible to regulate genes and causal to expression variability.

Recent news

Make use of the developer version for the latest features!

Version XX

+ Initial version without documentation.

Obtaining GOAL

The latest source code with documentation of the R-package can be downloaded from It will be soon available through CRAN along with documentation.

(New in 2014) Harvest is a suite of core-genome alignment and visualization tools for quickly analyzing thousands of intraspecific microbial genomes. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Combined they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees.

iMetAMOS is an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. iMetAMOS is available as a workflow within the metAMOS package starting with version 1.5.

A comprehensive system for finding unique DNA sequences that can be used to identify any bacterial or virus species or strain. Currently has over 13,000 species and strains in its database..

A fast, multithreaded k-mer counter.

R package to estimate differential abundance of marker gene survey data and visualize results.

Metagenomic datasets prove challenging to assemble using traditional assembly pipelines designed for individual genomes. Using AMOS as a foundation, we have created a robust & easy-to-use metagenomic assembly pipeline that takes reads (FASTA,FASTQ,SFF) and assembles them into Unitigs (CABOG,NEWBLER,Minimus,SOAPdenovo), Contigs & Scaffolds (Bambus2) & ORFs (Glimmer MG, MetaGeneMark), and annotates results using Metaphyler and a graph-based propagation method. MetAMOS was designed with efficiency in mind and can run through tens of millions of reads in a few hours on a multi-core workstation with ample RAM.

(New in 2010) Taxonomic Profiling for Metagenomic Sequences.

A correction pipeline to enable the use of the long-read sequences (such as those produced by the PacBio RS instrument) for assembly or other analysis.

Our study (linked below) was initiated before the emergence of software pipelines for 16S rRNA analysis. The analytical pipeline used in our project is provided here to enable reproducibility and in hopes that it would be useful as an education tool or for people who want to develop their own pipeline. Users (particularly novice ones) who simply want a dataset analyzed should instead use Qiime or Mothur.

Scaffolding using Optical Restriction Mapping

(New in 2012) Spanki is a toolkit for analysis of alternative splicing from RNA-SEQ data.

(New in February 2009) A short read aligner for RNA-Seq experiments. TopHat discovers novel exon-exon splice junctions and can align millions of RNA-Seq reads to a mammalian genome per hour.