Computational Gene Finding
|
|
|
|
a system that uses interpolated Markov models to find genes
in microbial DNA. March 2003: New release, version 2.1, automatically
optimizes ORF length for training. |
| TWAIN |
|
a Generalized Pair HMM to predict genes simultaneously in two
closely related eukaryotic organisms. |
GlimmerHMM
|
 |
a Generalized Hidden Markov Model gene-finder which makes
use of the
techniques implemented previously by GlimmerM |
GeneZilla
|

|
a generalized HMM for eukaryotic gene finding,
with a design similar to Genscan. Written and maintained by Bill
Majoros, now at Duke University.
|
ExAlt
|
 |
a Phylogenetic Generalized Hidden Markov Model for finding
alternatively spliced exons. |
JIGSAW
|
 |
(previously called Combiner),a program that predicts gene
models using the output from other annotation software. It uses a
statistical algorithm to identify patterns of evidence corresponding to
gene models. |
GeneSplicer
|
 |
a fast system for detecting splice sites in genomic DNA of
various eukaryotes. |
PIRATE
|

|
a website collecting many links to our gene finders and
others. |
Genome assembly and large-scale genome alignment
|
|
|

|
a system for aligning whole genomes, chromosomes, and other
very long DNA sequences. Since April 2003:
MUMmer 3.0 and later releases are open source. |
|
|
|
High throughput sequence alignment using Graphics Processing Units
(GPUs). Uses a technique called general-purpose GPU
programming (GPGPU programming) to harness the extreme parallelism of
GPUs for non-graphics tasks. In this application, hundreds of query
sequences are simultaneously aligned to a reference sequence, creating
an order of magnitude speed up over the same alignment on the CPU.
|
| AMOS Assembler
project |
 |
The is a set of tools, libraries, and freestanding genome
assemblers, all open source. AMOS is also an open consortium that
includes TIGR, the University of Maryland, The Karolinska Institutet,
and the Marine Biological Laboratory. |
AMOScmp
|
 |
is a comparative genome assembler, which uses
one genome as a reference on which to assemble another, closely related
species. See the journal paper
here.
|
MINIMUS
|
 |
(new in August 2004)
is a small, lightweight assembler for small jobs such as assembling a
viral genome, assembling a set of reads that match a single gene, or
other tasks that don't require the complex infrastructure of a
large-genome assembler. |
BAMBUS
|
 |
the first publicly available, standalone
genome sequence scaffolding program. It orders and orients contigs into
scaffolds based on various types of linking information.
|
Hawkeye
|
 |
A visual analytics tool for genome assembly analysis
and validation, designed to aid in identifying and correcting assembly
errors. All levels of the assembly data hierarchy are made accessible to
users, along with summary statistics and common assembly metrics. A
ranking component guides investigation towards likely mis-assemblies or
interesting features to support the task at hand. Can be used to
interactively analyze assemblies from many popular assemblers on your
desktop computer.
See the journal paper here.
|
AutoEditor
|

|
a tool for correcting sequencing and basecaller errors using
sequence assembly and chromatogram data. On average AutoEditor corrects
80% of erroneous base calls, with an accuracy of 99.99%. |
Figaro
|
 |
A vector trimmer capable of accurately trimming vector
from shotgun reads without prior knowledge of the vector sequence. Figaro statistically
models short oligo-nucleotide frequencies in order to infer which oligos are associated
vector sequence.
|
Celera Assembler
|
|
whole genome assembler
originally developed at Celera Genomics for the assembly of the human
genome. Currently CeleraAssembler is an open-source project at
SourceForge. The code is actively maintained by researchers at
the Venter Institute, the CBCB, and TIGR.
|
Other sequence analysis tools
|
| ELPH |
 |
a motif finder that can find ribosome binding sites, exon
splicing enhancers, or regulatory sites. |
| RepeatFinder |
 |
RepeatFinder, software for finding and characterizing repetitive sequences in complete and partial genomes. |
| SEE ESE |
 |
an online tool for identifying exon splicing enhancers
(ESEs) in Arabidopsis and Drosophila. |
|
|
 |
a program that finds rho-independent transcription
terminators in bacterial genomes. |
| OperonDB |
|
results from applying our operon-finding software to a large
number of prokaryotic genomes. (Described in Ermolaeva et al., Prediction of operons in microbial genomes,
listed above.) |
CRAB
|
|
Conserved
Regions in Archaea and Bacteria, a database of conserved intergenic
sites likely to regulate transcription of nearby genes
|
| Skewed oligomers |
|
from bacterial and archaeal genomes (from the paper inGene,
above). Get the source code
or Linux executable here. Tables of skewed
oligomers for: A. fulgidis, B. burgdorferi, B. subtilis, C. trachomatis, E. coli, H. influenzae, H. pylori, M. genitalium, M. jannaschii, M. pneumoniae, M. thermoautotrophicum, Synechocystis sp. PCC 6803, T. maritima, T. pallidum |
| A collection of
links to external
sequence analysis programs. |