|
About Glimmer
Glimmer is a system for finding genes in microbial DNA, especially the
genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and
Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to
identify the coding regions and distinguish them from noncoding DNA.
The IMM approach, described in our Nucleic
Acids Research paper on Glimmer 1.0 and in our subsequent paper
on Glimmer 2.0 , uses a combination of Markov models from 1st
through 8th-order, weighting each model according to its predictive
power. Glimmer uses 3-periodic nonhomogenous Markov models
in its IMMs.
Glimmer is the primary microbial gene finder used at The Institute for
Genomic Research (TIGR), where it was first developed, and has been
used to annotate the complete genomes of over 80 bacterial species from
TIGR and dozens (possibly hundreds) from other labs. Its analyses of
some of these genomes are
available at the Comprehensive Microbial
Resource site. Glimmer3 predictions are also available for all NCBI
RefSeq bacterial genomes at their ftp site.
For the eukaryotic version of Glimmer go to the GlimmerHMM site .
|
Current Version:
Glimmer version 3.02 is the current version of the system. |
| Version
3.02 Release Notes Download
Glimmer v3.02 |
| The previous version of Glimmer, v2.13, can still be
downloaded by clicking
here and is described on this
page |
Running Glimmer:
A Glimmer server is available on the NCBI website. To run Glimmer on
your
sequence, visit NCBI
Glimmer |
What's Changed from Glimmer2 to Glimmer3
Glimmer3 makes several algorithmic changes to reduce the number of
false positive predictions and to improve the accuracy of start-site
predictions. Changes also have been made in some program parameters and
options, and in output formats. Some specific differences are:
- Glimmer2 used a set of rules to attempt to resolve
overlaps between candidate orfs. When the overlap could not be
resolved, both orfs were included in the prediction list, resulting in
a high false-positive rate.
Glimmer3 uses a dynamic programming algorithm to select the
highest-scoring set of predictions consistent with the maximimum
allowed overlap. This reduces the number of false positive predictions
with little or no increase in the number of false negative predictions.
- Glimmer3 scores orfs in the reverse direction, i.e.,
from stop to start. This improves the accuracy of scores near the start
codon because the trailing context of the ICM is within the coding
region.
- The long-orfs program now uses an
amino-acid distribution model to filter the set of candidate orfs
before a set of long, non-overlapping orfs is selected.
- The make system and directory structure
has been revised to separate source, object and executable files.
- Program options are now specified before required
parameters (Unix style), rather than after (DOS style).
- The glimmer3 program produces two
separate output files: a .detail file with
information about all orfs (like the first part of Glimmer2 output);
and a .predict file containing just the final
predictions (like the last part of Glimmer2 output). glimmer3
requires a third parameter which is used to prefix the names of these
files.
- Glimmer3 prediction coordinates now include the stop
codon, and hence will differ from Glimmer2 values by 3.
- The glimmer3 program will process a
multi-fasta sequence file. The outputs for each sequence are preceded
by the fasta-header line in both the .detail and .predict
files.
For more information on Glimmer3 see the Version
3.02 Release Notes |
Glimmer3 vs. Glimmer2.13 Accuracy
Below are links to some comparisons of the results of Glimmer3 and
Glimmer2 on 30 microbial genomes from RefSeq at GenBank.
- Table
1. Probability models trained on genes with annotated function.
Predictions compared to the same set.
- Table
2. Probability models trained on genes with annotated function.
Predictions compared to all annotated genes.
- Table
3. Probability models trained on the output of the long-orfs
program. Predictions compared to genes with annotated function.
- Table
4. Probability models trained on the output of the long-orfs
program. Predictions compared to all annotated genes.
- Table
5. Glimmer2.13 long-orfs output and Glimmer3 long-orfs
output compared to all annotated genes.
|
Obtaining Glimmer
This software is OSI Certified
Open Source Software .
Click here to download
the complete Glimmer3 system . After downloading, uncompress the
distribution file by typing:
% tar xzf glimmer302.tar.gz
A directory named glimmer3.02 will be created,
containing a file glim302notes.pdf
with instructions on compiling and running the system. |
References
For a description of Glimmer 1, 2, and 3 see our papers:
- A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L.
Salzberg. Improved
microbial gene identification with GLIMMER, Nucleic Acids
Research 27:23 (1999), 4636-4641.
- S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene
identification using interpolated Markov models, Nucleic Acids
Research 26:2 (1998), 544-548.
- A.L. Delcher, K.A. Bratke, E.C. Powers,
and S.L. Salzberg. Identifying
bacterial genes
and endosymbiont DNA with Glimmer. Bioinformatics
(Advance online version) (2007). Check
the
journal site for the final version.
|
Acknowledgements
Glimmer is currently supported by the National
Library of Medicine at NIH under grant R01-LM007938. It was
previously supported by the National
Science Foundation under grants IRI-9530462 and IIS-9902923, and by
the National Institutes of Health
under grant R01-LM06845. |
|