Software

A major contribution of researchers in the CBCB are open-source software packages made freely available to the scientific community. The software described below are actively being developed and maintained. For software that is no longer being maintained in the Center (many of the packages are currently maintained by our alumni) please see the Inactive Software page.
Select one or more categories

The is a set of tools, libraries, and freestanding genome assemblers, all open source. AMOS is also an open consortium that includes TIGR, the University of Maryland, The Karolinska Institutet, and the Marine Biological Laboratory.

is a comparative genome assembler, which uses one genome as a reference on which to assemble another, closely related species. See the journal paper here.

(New in early 2009) Antibiotic Resistance Genes Database

Bambus 2.0, the second generation Bambus scaffolder available as an open source package. While most other scaffolders are closely tied to a specific assembly program, Bambus accepts the output from most current assemblers and provides the user with great flexibility in choosing the scaffolding parameters. In particular, Bambus is able to accept contig linking data other than specified by mate-pairs. Such sources of information include alignment to a reference genome (Bambus can directly use the output of MUMmer), physical mapping data, or information about gene synteny.

An ultrafast, memory-efficient short read aligner that aligns short DNA sequences to the human genome at a rate of about 25 million reads per hour on a typical workstation with 2 GB of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.1 GB for the human genome.

Steven Salzberg has been nominated for the 2013 Benjamin Franklin Award in the Life Sciences. This is a humanitarian/bioethics award presented to an individual who has, in his or her practice, promoted free and open access to the materials and methods used in the life sciences. More information on the award can be found at http://www.bioinformatics.org/franklin/.

A whole genome assembler originally developed at Celera Genomics for the assembly of the human genome. CeleraAssembler is now an open-source project at SourceForge. The code is actively maintained by researchers at CBCB and the Venter Institute (formerly known as TIGR, The Institute for Genomic Research).

(New in July 2010) DNACLUST is a tool for clustering millions of short DNA sequences. DNACLUST is free software.

Epiviz is an interactive visualization tool for functional genomics data. It supports genome navigation like other genome browsers, but allows multiple visualizations of data within genomic regions using scatterplots, heatmaps and other user-supplied visualizations. It also includes data from the Gene Expression Barcode project for transcriptome visualization. It has a flexible plugin framework so users can add d3 visualizations. You can find more information about Epiviz at http://epiviz.github.io and see a video tour here.

The Epivizr Bioconductor package implements two-way
communication between the R/Bioconductor computational genomics environment and EpiViz. Objects in an R session
can be displayed as tracks or plots on EpiViz. Epivizr uses WebSockets for communication
between the browser JavaScript client and the R environment.

Bayesian tool to integrate genetic and epigenetic data to find causal expression regulatory polymorphisms

Overview

GOAL is R implement eQTeL model, that integrate Genetic and Epigenetic data to identify SNPs that regulates expression of genes. More specifically, it leverages epigenetic data to estimate a) regulatory potential, and b) interaction potential to identify SNPs that are responsible to regulate genes and causal to expression variability.

Recent news

Make use of the developer version for the latest features!

Version XX

+ Initial version without documentation.

Obtaining GOAL

The latest source code with documentation of the R-package can be downloaded from https://github.com/vinash85/GOAL. It will be soon available through CRAN along with documentation.

(New in 2014) Harvest is a suite of core-genome alignment and visualization tools for quickly analyzing thousands of intraspecific microbial genomes. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Combined they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees.

iMetAMOS is an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. iMetAMOS is available as a workflow within the metAMOS package starting with version 1.5.

A comprehensive system for finding unique DNA sequences that can be used to identify any bacterial or virus species or strain. Currently has over 13,000 species and strains in its database.

A fast, multithreaded k-mer counter.

The analysis of these vast amounts of data is complicated by the fact that reconstructing large genomic segments from metagenomic reads is a formidable computational challenge. Even for single organisms, the assembly of genome sequences from sequencing reads is a complex task, primarily due to ambiguities in the reconstruction that are caused by genomic repeats. In metagenomic data, additional challenges arise from the non-uniform representation of genomes in a sample as well as from the genomic variants between the sequences of closely related organisms. Despite advances in metagenomic assembly algorithms over the past years, the computational difficulty of the assembly process remains high and the quality of the resulting data fairly low.

As a result, many analyses of metagenomic data are performed directly on unassembled reads, however the much shorter genomic context leads to lower accuracy.

Reference-guided, comparative assembly approaches have previously been used to assist the assembly of short reads when a closely related reference genome was available Comparative assembly works as follows: short sequencing reads are aligned to a reference genome of a closely related species, then their reconstruction into contigs is inferred from their relative locations in the reference genome. This process overcomes, in part, the challenge posed by repeats as the entire read (not just the segment that overlaps adjacent reads) provides information about its location in the genome.

Currently, tens of thousands of bacterial genomes have been sequenced, and the number is expected to grow rapidly in the near future. These sequenced genomes provide a great resource for performing comparative assembly of metagenomic sequences, however they have yet to be used for this purpose in no small part due to the tremendous computational cost of aligning the reads from a metagenomic project to the entire reference collection of bacterial genomes.

MetaCompass is the first assembly software package for the reference-assisted assembly of metagenomic data. We rely on an indexing strategy to quickly construct sample-specific reference collections, and show that this approach effectively complements de novo assembly methods.

R package to estimate differential abundance of marker gene survey data and visualize results.

Metagenomic datasets prove challenging to assemble using traditional assembly pipelines designed for individual genomes. Using AMOS as a foundation, we have created a robust & easy-to-use metagenomic assembly pipeline that takes reads (FASTA,FASTQ,SFF) and assembles them into Unitigs (CABOG,NEWBLER,Minimus,SOAPdenovo), Contigs & Scaffolds (Bambus2) & ORFs (Glimmer MG, MetaGeneMark), and annotates results using Metaphyler and a graph-based propagation method. MetAMOS was designed with efficiency in mind and can run through tens of millions of reads in a few hours on a multi-core workstation with ample RAM.

assembly and analysis toolkit for metagenomics

metAMOS is an integrated assembly and analysis pipeline for metagenomic data. It is built around the Bambus2 metagenomic scaffolder and includes many current tools for assembly, gene finding, and taxonomic classification. metAMOS is under active development and changes quite frequently

Obtaining metAMOS

You can get metAMOS from the metAMOS GitHub page. While there please check out other software from MARBL (The MARyland Bioinformatics Labs) - a loose consortium of bioinformatics developers started by current and former members of the University of Maryland Center for Bioinformatics and Computational Biology.

Documentation

metAMOS documentation at Read the Docs - provides all the information you need to get started.

(New in 2010) Taxonomic Profiling for Metagenomic Sequences.

A correction pipeline to enable the use of the long-read sequences (such as those produced by the PacBio RS instrument) for assembly or other analysis.

Our study (linked below) was initiated before the emergence of software pipelines for 16S rRNA analysis. The analytical pipeline used in our project is provided here to enable reproducibility and in hopes that it would be useful as an education tool or for people who want to develop their own pipeline. Users (particularly novice ones) who simply want a dataset analyzed should instead use Qiime or Mothur.

Scaffolding using Optical Restriction Mapping

(New in 2012) Spanki is a toolkit for analysis of alternative splicing from RNA-SEQ data.

(New in February 2009) A short read aligner for RNA-Seq experiments. TopHat discovers novel exon-exon splice junctions and can align millions of RNA-Seq reads to a mammalian genome per hour.

VALET is a pipeline for performing de novo validation of metagenomic assemblies. VALET checks a number of properties that should hold true for a correct assembly (e.g., mate-pairs are aligned at the correct distance from each other in the assembly, the depth of coverage is fairly uniform along contigs, etc.). The violations of these invariants are reported allowing one to pinpoint areas that were potentially mis-assembled, or to compare the quality of different assemblies. For comparing multiple assemblies of the same data-sets, VALET also reports an overall estimate of the likelihood a particular assembly is correct.