Computational Gene Finding
|
|
|
|
a system that uses interpolated Markov models to find genes
in microbial DNA. March 2003: New release, version 2.1, automatically
optimizes ORF length for training. |
| TWAIN |
|
a Generalized Pair HMM to predict genes simultaneously in two
closely related eukaryotic organisms. |
GlimmerHMM
|
 |
a Generalized Hidden Markov Model gene-finder which makes
use of the
techniques implemented previously by GlimmerM |
GeneZilla
|

|
a generalized HMM for eukaryotic gene finding,
with a design similar to Genscan. Written and maintained by Bill
Majoros, now at Duke University.
|
ExAlt
|
 |
a Phylogenetic Generalized Hidden Markov Model for finding
alternatively spliced exons. |
JIGSAW
|
 |
(previously called Combiner),a program that predicts gene
models using the output from other annotation software. It uses a
statistical algorithm to identify patterns of evidence corresponding to
gene models. |
GeneSplicer
|
 |
a fast system for detecting splice sites in genomic DNA of
various eukaryotes. |
PIRATE
|

|
a website collecting many links to our gene finders and
others. |
Genome assembly and large-scale genome alignment
|
|
|

|
a system for aligning whole genomes, chromosomes, and other
very long DNA sequences. New (May 2008): see how to
use MUMmer to align Solexa
reads to the human genome. |
|
|
|
High throughput sequence alignment using Graphics Processing Units
(GPUs). Uses a technique called general-purpose GPU
programming (GPGPU programming) to harness the extreme parallelism of
GPUs for non-graphics tasks. In this application, hundreds of query
sequences are simultaneously aligned to a reference sequence, creating
an order of magnitude speed up over the same alignment on the CPU.
|
| AMOS Assembler
project |
 |
The is a set of tools, libraries, and freestanding genome
assemblers, all open source. AMOS is also an open consortium that
includes TIGR, the University of Maryland, The Karolinska Institutet,
and the Marine Biological Laboratory. |
ABBA
|
 |
|
AMOScmp
|
 |
is a comparative genome assembler, which uses
one genome as a reference on which to assemble another, closely related
species. See the journal paper
here.
|
MINIMUS
|
 |
A small, lightweight assembler for small jobs such as assembling a
viral genome, assembling a set of reads that match a single gene, or
other tasks that don't require the complex infrastructure of a
large-genome assembler. |
Bowtie
|
| (New in August 2008)
An ultrafast, memory-efficient short read aligner that aligns short DNA sequences to the human genome at a rate of about 25 million reads per hour on a typical workstation with 2 GB of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.1 GB for the human genome. |
BAMBUS
|
 |
the first publicly available, standalone
genome sequence scaffolding program. It orders and orients contigs into
scaffolds based on various types of linking information.
|
CloudBurst
|
| (New in Nov 2008)
Highly Sensitive Short Read mapping with MapReduce. CloudBurst uses
Hadoop - an open source version of Google's parallel computing software
MapReduce - to efficiently parallelize the short read mapping problem to dozens or
hundreds of computers. This enables CloudBurst to execute highly
sensitive read mappings with any number of mutations or indels. |
Hawkeye
|
 |
A visual analytics tool for genome assembly analysis
and validation, designed to aid in identifying and correcting assembly
errors. All levels of the assembly data hierarchy are made accessible to
users, along with summary statistics and common assembly metrics. A
ranking component guides investigation towards likely mis-assemblies or
interesting features to support the task at hand. Can be used to
interactively analyze assemblies from many popular assemblers on your
desktop computer.
See the journal paper here.
|
AutoEditor
|

|
a tool for correcting sequencing and basecaller errors using
sequence assembly and chromatogram data. On average AutoEditor corrects
80% of erroneous base calls, with an accuracy of 99.99%. |
Figaro
|
 |
A vector trimmer capable of accurately trimming vector
from shotgun reads without prior knowledge of the vector sequence. Figaro statistically
models short oligo-nucleotide frequencies in order to infer which oligos are associated
vector sequence.
|
Celera Assembler
|
|
whole genome assembler
originally developed at Celera Genomics for the assembly of the human
genome. Currently CeleraAssembler is an open-source project at
SourceForge. The code is actively maintained by researchers at
the Venter Institute, the CBCB, and TIGR.
|
Other sequence analysis tools
|
| ELPH |
 |
a motif finder that can find ribosome binding sites, exon
splicing enhancers, or regulatory sites. |
| RepeatFinder |
 |
RepeatFinder, software for finding and characterizing repetitive sequences in complete and partial genomes. |
| SEE ESE |
 |
an online tool for identifying exon splicing enhancers
(ESEs) in Arabidopsis and Drosophila. |
|
|
 |
a program that finds rho-independent transcription
terminators in bacterial genomes. |
| OperonDB |
|
results from applying our operon-finding software to a large
number of prokaryotic genomes. (Described in Ermolaeva et al., Prediction of operons in microbial genomes,
listed above.) |
CRAB
|
|
Conserved
Regions in Archaea and Bacteria, a database of conserved intergenic
sites likely to regulate transcription of nearby genes
|
| Skewed oligomers |
|
from bacterial and archaeal genomes (from the paper inGene,
above). Get the source code
or Linux executable here. Tables of skewed
oligomers for: A. fulgidis, B. burgdorferi, B. subtilis, C. trachomatis, E. coli, H. influenzae, H. pylori, M. genitalium, M. jannaschii, M. pneumoniae, M. thermoautotrophicum, Synechocystis sp. PCC 6803, T. maritima, T. pallidum |
| A collection of
links to external
sequence analysis programs. |