TY - Generic T1 - Developmental expression of chicken FOXN1 and putative target genes during feather development. Y1 - 2014 A1 - Darnell, Diana K A1 - Zhang, Li S A1 - Hannenhalli, Sridhar A1 - Yaklichkin, Sergey Y KW - Amino Acid Sequence KW - Animals KW - Biological Evolution KW - Blotting, Western KW - Cell Differentiation KW - Cells, Cultured KW - Chick Embryo KW - Chickens KW - Cloning, Molecular KW - Embryo, Nonmammalian KW - Epidermis KW - Feathers KW - Forkhead Transcription Factors KW - Gene Expression Regulation, Developmental KW - In Situ Hybridization KW - Molecular Sequence Data KW - Morphogenesis KW - Phylogeny KW - Real-Time Polymerase Chain Reaction KW - Reverse Transcriptase Polymerase Chain Reaction KW - RNA, Messenger KW - Sequence Homology, Amino Acid AB -

FOXN1 is a member of the forkhead box family of transcription factors. FOXN1 is crucial for hair outgrowth and thymus differentiation in mammals. Unlike the thymus, which is found in all amniotes, hair is an epidermal appendage that arose after the last shared common ancestor between mammals and birds, and hair and feathers differ markedly in their differentiation and gene expression. Here, we show that FOXN1 is expressed in embryonic chicken feathers, nails and thymus, demonstrating an evolutionary conservation that goes beyond obvious homology. At embryonic day (ED) 12, FOXN1 is expressed in some feather buds and at ED13 expression extends along the length of the feather filament. At ED14 FOXN1 mRNA is restricted to the proximal feather filament and is not detectable in distal feather shafts. At the base of the feather, FOXN1 is expressed in the epithelium of the feather sheath and distal barb and marginal plate, whereas in the midsection FOXN1 transcripts are mainly detected in the barb plates of the feather filament. FOXN1 is also expressed in claws; however, no expression was detected in skin or scales. Despite expression of FOXN1 in developing feathers, examination of chick homologs of five putative mammalian FOXN1 target genes shows that, while these genes are expressed in feathers, there is little similarity to the FOXN1 expression pattern, suggesting that some gene regulatory networks may have diverged during evolution of epidermal appendages.

JA - Int J Dev Biol VL - 58 CP - 1 M3 - 10.1387/ijdb.130023sy ER - TY - JOUR T1 - A large-scale, higher-level, molecular phylogenetic study of the insect order Lepidoptera (moths and butterflies) JF - PLoS OnePLoS One Y1 - 2013 A1 - Regier, Jerome C. A1 - Mitter, Charles A1 - Zwick, Andreas A1 - Adam L. Bazinet A1 - Michael P. Cummings A1 - Kawahara, Akito Y. A1 - Sohn, Jae-Cheon A1 - Zwickl, Derrick J. A1 - Cho, Soowon A1 - Davis, Donald R. A1 - Baixeras, Joaquin A1 - Brown, John A1 - Parr, Cynthia A1 - Weller, Susan A1 - Lees, David C. A1 - Mitter, Kim T. KW - Animals KW - Butterflies KW - Moths KW - Phylogeny AB -

BACKGROUND: Higher-level relationships within the Lepidoptera, and particularly within the species-rich subclade Ditrysia, are generally not well understood, although recent studies have yielded progress. We present the most comprehensive molecular analysis of lepidopteran phylogeny to date, focusing on relationships among superfamilies.

METHODOLOGY PRINCIPAL FINDINGS: 483 taxa spanning 115 of 124 families were sampled for 19 protein-coding nuclear genes, from which maximum likelihood tree estimates and bootstrap percentages were obtained using GARLI. Assessment of heuristic search effectiveness showed that better trees and higher bootstrap percentages probably remain to be discovered even after 1000 or more search replicates, but further search proved impractical even with grid computing. Other analyses explored the effects of sampling nonsynonymous change only versus partitioned and unpartitioned total nucleotide change; deletion of rogue taxa; and compositional heterogeneity. Relationships among the non-ditrysian lineages previously inferred from morphology were largely confirmed, plus some new ones, with strong support. Robust support was also found for divergences among non-apoditrysian lineages of Ditrysia, but only rarely so within Apoditrysia. Paraphyly for Tineoidea is strongly supported by analysis of nonsynonymous-only signal; conflicting, strong support for tineoid monophyly when synonymous signal was added back is shown to result from compositional heterogeneity. CONCLUSIONS SIGNIFICANCE: Support for among-superfamily relationships outside the Apoditrysia is now generally strong. Comparable support is mostly lacking within Apoditrysia, but dramatically increased bootstrap percentages for some nodes after rogue taxon removal, and concordance with other evidence, strongly suggest that our picture of apoditrysian phylogeny is approximately correct. This study highlights the challenge of finding optimal topologies when analyzing hundreds of taxa. It also shows that some nodes get strong support only when analysis is restricted to nonsynonymous change, while total change is necessary for strong support of others. Thus, multiple types of analyses will be necessary to fully resolve lepidopteran phylogeny.

VL - 8 ER - TY - JOUR T1 - Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage. JF - ISME J Y1 - 2012 A1 - Dupont, Chris L A1 - Rusch, Douglas B A1 - Yooseph, Shibu A1 - Lombardo, Mary-Jane A1 - Richter, R Alexander A1 - Valas, Ruben A1 - Novotny, Mark A1 - Yee-Greenbaum, Joyclyn A1 - Selengut, Jeremy D A1 - Haft, Dan H A1 - Halpern, Aaron L A1 - Lasken, Roger S A1 - Nealson, Kenneth A1 - Friedman, Robert A1 - Venter, J Craig KW - Computational Biology KW - Gammaproteobacteria KW - Genome, Bacterial KW - Genomic Library KW - metagenomics KW - Oceans and Seas KW - Phylogeny KW - plankton KW - Rhodopsin KW - Rhodopsins, Microbial KW - RNA, Ribosomal, 16S KW - Seawater AB -

Bacteria in the 16S rRNA clade SAR86 are among the most abundant uncultivated constituents of microbial assemblages in the surface ocean for which little genomic information is currently available. Bioinformatic techniques were used to assemble two nearly complete genomes from marine metagenomes and single-cell sequencing provided two more partial genomes. Recruitment of metagenomic data shows that these SAR86 genomes substantially increase our knowledge of non-photosynthetic bacteria in the surface ocean. Phylogenomic analyses establish SAR86 as a basal and divergent lineage of γ-proteobacteria, and the individual genomes display a temperature-dependent distribution. Modestly sized at 1.25-1.7 Mbp, the SAR86 genomes lack several pathways for amino-acid and vitamin synthesis as well as sulfate reduction, trends commonly observed in other abundant marine microbes. SAR86 appears to be an aerobic chemoheterotroph with the potential for proteorhodopsin-based ATP generation, though the apparent lack of a retinal biosynthesis pathway may require it to scavenge exogenously-derived pigments to utilize proteorhodopsin. The genomes contain an expanded capacity for the degradation of lipids and carbohydrates acquired using a wealth of tonB-dependent outer membrane receptors. Like the abundant planktonic marine bacterial clade SAR11, SAR86 exhibits metabolic streamlining, but also a distinct carbon compound specialization, possibly avoiding competition.

VL - 6 CP - 6 M3 - 10.1038/ismej.2011.189 ER - TY - JOUR T1 - Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage JF - The ISME journalThe ISME journal Y1 - 2012 A1 - Dupont, Chris L. A1 - Rusch, Douglas B. A1 - Yooseph, Shibu A1 - Lombardo, Mary-Jane A1 - Richter, R. Alexander A1 - Valas, Ruben A1 - Novotny, Mark A1 - Yee-Greenbaum, Joyclyn A1 - J. Selengut A1 - Haft, Dan H. A1 - Halpern, Aaron L. A1 - Lasken, Roger S. A1 - Nealson, Kenneth A1 - Friedman, Robert A1 - Venter, J. Craig KW - Computational Biology KW - Gammaproteobacteria KW - Genome, Bacterial KW - Genomic Library KW - metagenomics KW - Oceans and Seas KW - Phylogeny KW - plankton KW - Rhodopsin KW - RNA, Ribosomal, 16S KW - Seawater AB - Bacteria in the 16S rRNA clade SAR86 are among the most abundant uncultivated constituents of microbial assemblages in the surface ocean for which little genomic information is currently available. Bioinformatic techniques were used to assemble two nearly complete genomes from marine metagenomes and single-cell sequencing provided two more partial genomes. Recruitment of metagenomic data shows that these SAR86 genomes substantially increase our knowledge of non-photosynthetic bacteria in the surface ocean. Phylogenomic analyses establish SAR86 as a basal and divergent lineage of γ-proteobacteria, and the individual genomes display a temperature-dependent distribution. Modestly sized at 1.25-1.7 Mbp, the SAR86 genomes lack several pathways for amino-acid and vitamin synthesis as well as sulfate reduction, trends commonly observed in other abundant marine microbes. SAR86 appears to be an aerobic chemoheterotroph with the potential for proteorhodopsin-based ATP generation, though the apparent lack of a retinal biosynthesis pathway may require it to scavenge exogenously-derived pigments to utilize proteorhodopsin. The genomes contain an expanded capacity for the degradation of lipids and carbohydrates acquired using a wealth of tonB-dependent outer membrane receptors. Like the abundant planktonic marine bacterial clade SAR11, SAR86 exhibits metabolic streamlining, but also a distinct carbon compound specialization, possibly avoiding competition. VL - 6 N1 - http://www.ncbi.nlm.nih.gov/pubmed/22170421?dopt=Abstract ER - TY - JOUR T1 - ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological process. JF - BMC Bioinformatics Y1 - 2011 A1 - Basu, Malay K A1 - Selengut, Jeremy D A1 - Haft, Daniel H KW - algorithms KW - Archaea KW - Archaeal Proteins KW - DNA KW - Methane KW - Phylogeny KW - software AB -

BACKGROUND: Phylogenetic profiling is a technique of scoring co-occurrence between a protein family and some other trait, usually another protein family, across a set of taxonomic groups. In spite of several refinements in recent years, the technique still invites significant improvement. To be its most effective, a phylogenetic profiling algorithm must be able to examine co-occurrences among protein families whose boundaries are uncertain within large homologous protein superfamilies.

RESULTS: Partial Phylogenetic Profiling (PPP) is an iterative algorithm that scores a given taxonomic profile against the taxonomic distribution of families for all proteins in a genome. The method works through optimizing the boundary of each protein family, rather than by relying on prebuilt protein families or fixed sequence similarity thresholds. Double Partial Phylogenetic Profiling (DPPP) is a related procedure that begins with a single sequence and searches for optimal granularities for its surrounding protein family in order to generate the best query profiles for PPP. We present ProPhylo, a high-performance software package for phylogenetic profiling studies through creating individually optimized protein family boundaries. ProPhylo provides precomputed databases for immediate use and tools for manipulating the taxonomic profiles used as queries.

CONCLUSION: ProPhylo results show universal markers of methanogenesis, a new DNA phosphorothioation-dependent restriction enzyme, and efficacy in guiding protein family construction. The software and the associated databases are freely available under the open source Perl Artistic License from ftp://ftp.jcvi.org/pub/data/ppp/.

VL - 12 M3 - 10.1186/1471-2105-12-434 ER - TY - JOUR T1 - ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological process JF - BMC bioinformaticsBMC Bioinformatics Y1 - 2011 A1 - Basu, Malay K. A1 - J. Selengut A1 - Haft, Daniel H. KW - algorithms KW - Archaea KW - Archaeal Proteins KW - DNA KW - Methane KW - Phylogeny KW - software AB - BACKGROUND: Phylogenetic profiling is a technique of scoring co-occurrence between a protein family and some other trait, usually another protein family, across a set of taxonomic groups. In spite of several refinements in recent years, the technique still invites significant improvement. To be its most effective, a phylogenetic profiling algorithm must be able to examine co-occurrences among protein families whose boundaries are uncertain within large homologous protein superfamilies. RESULTS: Partial Phylogenetic Profiling (PPP) is an iterative algorithm that scores a given taxonomic profile against the taxonomic distribution of families for all proteins in a genome. The method works through optimizing the boundary of each protein family, rather than by relying on prebuilt protein families or fixed sequence similarity thresholds. Double Partial Phylogenetic Profiling (DPPP) is a related procedure that begins with a single sequence and searches for optimal granularities for its surrounding protein family in order to generate the best query profiles for PPP. We present ProPhylo, a high-performance software package for phylogenetic profiling studies through creating individually optimized protein family boundaries. ProPhylo provides precomputed databases for immediate use and tools for manipulating the taxonomic profiles used as queries. CONCLUSION: ProPhylo results show universal markers of methanogenesis, a new DNA phosphorothioation-dependent restriction enzyme, and efficacy in guiding protein family construction. The software and the associated databases are freely available under the open source Perl Artistic License from ftp://ftp.jcvi.org/pub/data/ppp/. VL - 12 N1 - http://www.ncbi.nlm.nih.gov/pubmed/22070167?dopt=Abstract ER - TY - JOUR T1 - The Alveolate Perkinsus marinus: biological insights from EST gene discovery. JF - BMC Genomics Y1 - 2010 A1 - Joseph, Sandeep J A1 - Fernández-Robledo, José A A1 - Gardner, Malcolm J A1 - El-Sayed, Najib M A1 - Kuo, Chih-Horng A1 - Schott, Eric J A1 - Wang, Haiming A1 - Kissinger, Jessica C A1 - Vasta, Gerardo R KW - Alveolata KW - Animals KW - Expressed Sequence Tags KW - Ostreidae KW - Phylogeny AB -

BACKGROUND: Perkinsus marinus, a protozoan parasite of the eastern oyster Crassostrea virginica, has devastated natural and farmed oyster populations along the Atlantic and Gulf coasts of the United States. It is classified as a member of the Perkinsozoa, a recently established phylum considered close to the ancestor of ciliates, dinoflagellates, and apicomplexans, and a key taxon for understanding unique adaptations (e.g. parasitism) within the Alveolata. Despite intense parasite pressure, no disease-resistant oysters have been identified and no effective therapies have been developed to date.

RESULTS: To gain insight into the biological basis of the parasite's virulence and pathogenesis mechanisms, and to identify genes encoding potential targets for intervention, we generated>31,000 5' expressed sequence tags (ESTs) derived from four trophozoite libraries generated from two P. marinus strains. Trimming and clustering of the sequence tags yielded 7,863 unique sequences, some of which carry a spliced leader. Similarity searches revealed that 55% of these had hits in protein sequence databases, of which 1,729 had their best hit with proteins from the chromalveolates (E-value

CONCLUSIONS: Our transcriptome analysis of P. marinus, the first for any member of the Perkinsozoa, contributes new insight into its biology and taxonomic position. It provides a very informative, albeit preliminary, glimpse into the expression of genes encoding functionally relevant proteins as potential targets for chemotherapy, and evidence for the presence of a relict plastid. Further, although P. marinus sequences display significant similarity to those from both apicomplexans and dinoflagellates, the presence of trans-spliced transcripts confirms the previously established affinities with the latter. The EST analysis reported herein, together with the recently completed sequence of the P. marinus genome and the development of transfection methodology, should result in improved intervention strategies against dermo disease.

VL - 11 M3 - 10.1186/1471-2164-11-228 ER - TY - Generic T1 - MetaPhyler: Taxonomic profiling for metagenomic sequences T2 - 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Y1 - 2010 A1 - Liu, Bo A1 - Gibbons, T. A1 - Ghodsi, M. A1 - M. Pop KW - Bioinformatics KW - CARMA comparison KW - Databases KW - Genomics KW - Linear regression KW - marker genes KW - matching length KW - Megan comparison KW - metagenomic sequences KW - metagenomics KW - MetaPhyler KW - microbial diversity KW - microorganisms KW - molecular biophysics KW - molecular configurations KW - Pattern classification KW - pattern matching KW - phylogenetic classification KW - Phylogeny KW - PhymmBL comparison KW - reference gene database KW - Sensitivity KW - sequence matching KW - taxonomic classifier KW - taxonomic level KW - taxonomic profiling KW - whole metagenome sequencing data AB - A major goal of metagenomics is to characterize the microbial diversity of an environment. The most popular approach relies on 16S rRNA sequencing, however this approach can generate biased estimates due to differences in the copy number of the 16S rRNA gene between even closely related organisms, and due to PCR artifacts. The taxonomic composition can also be determined from whole-metagenome sequencing data by matching individual sequences against a database of reference genes. One major limitation of prior methods used for this purpose is the use of a universal classification threshold for all genes at all taxonomic levels. We propose that better classification results can be obtained by tuning the taxonomic classifier to each matching length, reference gene, and taxonomic level. We present a novel taxonomic profiler MetaPhyler, which uses marker genes as a taxonomic reference. Results on simulated datasets demonstrate that MetaPhyler outperforms other tools commonly used in this context (CARMA, Megan and PhymmBL). We also present interesting results obtained by applying MetaPhyler to a real metagenomic dataset. JA - 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) PB - IEEE SN - 978-1-4244-8306-8 ER - TY - JOUR T1 - Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function JF - BMC bioinformaticsBMC Bioinformatics Y1 - 2010 A1 - J. Selengut A1 - Rusch, Douglas B. A1 - Haft, Daniel H. KW - algorithms KW - Amino Acid Sequence KW - Gene Expression Profiling KW - Molecular Sequence Data KW - Phylogeny KW - Proteins KW - Sequence Analysis, Protein KW - Structure-Activity Relationship AB - BACKGROUND: Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. RESULTS: Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. CONCLUSIONS: SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites. VL - 11 N1 - http://www.ncbi.nlm.nih.gov/pubmed/20102603?dopt=Abstract ER - TY - JOUR T1 - Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function. JF - BMC Bioinformatics Y1 - 2010 A1 - Selengut, Jeremy D A1 - Rusch, Douglas B A1 - Haft, Daniel H KW - algorithms KW - Amino Acid Sequence KW - Gene Expression Profiling KW - Molecular Sequence Data KW - Phylogeny KW - Proteins KW - Sequence Analysis, Protein KW - Structure-Activity Relationship AB -

BACKGROUND: Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets.

RESULTS: Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization.

CONCLUSIONS: SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites.

VL - 11 M3 - 10.1186/1471-2105-11-52 ER - TY - JOUR T1 - Unexpected abundance of coenzyme F(420)-dependent enzymes in Mycobacterium tuberculosis and other actinobacteria JF - Journal of bacteriologyJournal of bacteriology Y1 - 2010 A1 - J. Selengut A1 - Haft, Daniel H. KW - Actinobacteria KW - Amino Acid Sequence KW - Binding Sites KW - Coenzymes KW - Flavonoids KW - Gene Expression Profiling KW - Gene Expression Regulation, Bacterial KW - Genome, Bacterial KW - molecular biology KW - Molecular Sequence Data KW - Molecular Structure KW - Mycobacterium tuberculosis KW - Phylogeny KW - Protein Conformation KW - Riboflavin AB - Regimens targeting Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), require long courses of treatment and a combination of three or more drugs. An increase in drug-resistant strains of M. tuberculosis demonstrates the need for additional TB-specific drugs. A notable feature of M. tuberculosis is coenzyme F(420), which is distributed sporadically and sparsely among prokaryotes. This distribution allows for comparative genomics-based investigations. Phylogenetic profiling (comparison of differential gene content) based on F(420) biosynthesis nominated many actinobacterial proteins as candidate F(420)-dependent enzymes. Three such families dominated the results: the luciferase-like monooxygenase (LLM), pyridoxamine 5'-phosphate oxidase (PPOX), and deazaflavin-dependent nitroreductase (DDN) families. The DDN family was determined to be limited to F(420)-producing species. The LLM and PPOX families were observed in F(420)-producing species as well as species lacking F(420) but were particularly numerous in many actinobacterial species, including M. tuberculosis. Partitioning the LLM and PPOX families based on an organism's ability to make F(420) allowed the application of the SIMBAL (sites inferred by metabolic background assertion labeling) profiling method to identify F(420)-correlated subsequences. These regions were found to correspond to flavonoid cofactor binding sites. Significantly, these results showed that M. tuberculosis carries at least 28 separate F(420)-dependent enzymes, most of unknown function, and a paucity of flavin mononucleotide (FMN)-dependent proteins in these families. While prevalent in mycobacteria, markers of F(420) biosynthesis appeared to be absent from the normal human gut flora. These findings suggest that M. tuberculosis relies heavily on coenzyme F(420) for its redox reactions. This dependence and the cofactor's rarity may make F(420)-related proteins promising drug targets. VL - 192 N1 - http://www.ncbi.nlm.nih.gov/pubmed/20675471?dopt=Abstract ER - TY - JOUR T1 - Unexpected abundance of coenzyme F(420)-dependent enzymes in Mycobacterium tuberculosis and other actinobacteria. JF - J Bacteriol Y1 - 2010 A1 - Selengut, Jeremy D A1 - Haft, Daniel H KW - Actinobacteria KW - Amino Acid Sequence KW - Binding Sites KW - Coenzymes KW - Flavonoids KW - Gene Expression Profiling KW - Gene Expression Regulation, Bacterial KW - Genome, Bacterial KW - molecular biology KW - Molecular Sequence Data KW - Molecular Structure KW - Mycobacterium tuberculosis KW - Phylogeny KW - Protein Conformation KW - Riboflavin AB -

Regimens targeting Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), require long courses of treatment and a combination of three or more drugs. An increase in drug-resistant strains of M. tuberculosis demonstrates the need for additional TB-specific drugs. A notable feature of M. tuberculosis is coenzyme F(420), which is distributed sporadically and sparsely among prokaryotes. This distribution allows for comparative genomics-based investigations. Phylogenetic profiling (comparison of differential gene content) based on F(420) biosynthesis nominated many actinobacterial proteins as candidate F(420)-dependent enzymes. Three such families dominated the results: the luciferase-like monooxygenase (LLM), pyridoxamine 5'-phosphate oxidase (PPOX), and deazaflavin-dependent nitroreductase (DDN) families. The DDN family was determined to be limited to F(420)-producing species. The LLM and PPOX families were observed in F(420)-producing species as well as species lacking F(420) but were particularly numerous in many actinobacterial species, including M. tuberculosis. Partitioning the LLM and PPOX families based on an organism's ability to make F(420) allowed the application of the SIMBAL (sites inferred by metabolic background assertion labeling) profiling method to identify F(420)-correlated subsequences. These regions were found to correspond to flavonoid cofactor binding sites. Significantly, these results showed that M. tuberculosis carries at least 28 separate F(420)-dependent enzymes, most of unknown function, and a paucity of flavin mononucleotide (FMN)-dependent proteins in these families. While prevalent in mycobacteria, markers of F(420) biosynthesis appeared to be absent from the normal human gut flora. These findings suggest that M. tuberculosis relies heavily on coenzyme F(420) for its redox reactions. This dependence and the cofactor's rarity may make F(420)-related proteins promising drug targets.

VL - 192 CP - 21 M3 - 10.1128/JB.00425-10 ER - TY - JOUR T1 - Validating the systematic position of ıt Plationus Segers, Murugan & Dumont, 1993 (Rotifera: Brachionidae) using sequences of the large subunit of the nuclear ribosomal DNA and of cytochrome C oxidase JF - HydrobiologiaHydrobiologia Y1 - 2010 A1 - Reyna-Fabian, M. E. A1 - Laclette, J. P. A1 - Michael P. Cummings A1 - García-Varela, M. KW - Cox1 KW - likelihood KW - LSU KW - maximum KW - Phylogeny KW - Plationus AB - Members of the family Brachionidae are free-living organisms that range in size from 170 to 250 microns. They comprise part of the zooplankton in freshwater and marine systems worldwide. Morphologically, members of the family are characterized by a single piece loricated body without furrows, grooves, sulci or dorsal head shields, and a malleate trophi. Differences in these structures have been traditionally used to recognize 217 species that are classified into seven genera. However, the validity of the species, Plationus patulus, P. patulus macracanthus P. polyacanthus, and P. felicitas have been confused because they were alternatively assigned in Brachionus or Platyias, when considering only morphological and ecological characters. Based on scanning electron microscope (SEM) images of the trophi, these taxa were assigned in a new genus, Plationus. In this study, we examined the systematic position of P. patulus and P. patulus macracanthus using DNA sequences of two genes: the cytochrome oxidase subunit 1 (cox1) and domains D2 and D3 of the large subunit of the nuclear ribosomal RNA (LSU). In addition, the cox1 and LSU sequences representing five genera of Brachionidae (Anuraeopsis, Brachionus, Keratella, Plationus, and Platyias) plus four species of three families from the order Ploima were used as the outgroup. The maximum likelihood (ML) analyses were conducted for each individual gene as well as for the combined (cox1 + LSU) data set. The ML tree from the combined data set yielded the family Brachionidae as a monophyletic group with weak bootstrap support (< 50%). Five main clades in this tree had high (> 85%) bootstrap support. The first clade was composed of three populations of P. patulus + P. patulus macracanthus. The second clade was composed of a single species of Platyias. The third clade was composed of six species of Brachionus. The fourth clade included a single species of the genus Anuraeopsis, and the fifth clade was composed of three species of the genus Keratella. The genetic divergence between Plationus and Platyias ranged from 18.4 to 19.2% for cox1, and from 4.5 to 4.9% for LSU, and between Brachionus and Plationus, it ranged from 16.9 to 23.1% (cox1), and from 7.3 to 9.1% (LSU). Morphological evidence, the amount of genetic divergence, the systematic position of Plationus within the family Brachionidae, and the position of Plationus as a sister group of Brachionus and Platyias support the validity of Plationus patulus and P. patulus macracanthus into the genus Plationus. VL - 644 ER - TY - JOUR T1 - Three genomes from the phylum Acidobacteria provide insight into the lifestyles of these microorganisms in soils JF - Applied and environmental microbiologyApplied and environmental microbiology Y1 - 2009 A1 - Ward, Naomi L. A1 - Challacombe, Jean F. A1 - Janssen, Peter H. A1 - Henrissat, Bernard A1 - Coutinho, Pedro M. A1 - Wu, Martin A1 - Xie, Gary A1 - Haft, Daniel H. A1 - Sait, Michelle A1 - Badger, Jonathan A1 - Barabote, Ravi D. A1 - Bradley, Brent A1 - Brettin, Thomas S. A1 - Brinkac, Lauren M. A1 - Bruce, David A1 - Creasy, Todd A1 - Daugherty, Sean C. A1 - Davidsen, Tanja M. A1 - DeBoy, Robert T. A1 - Detter, J. Chris A1 - Dodson, Robert J. A1 - Durkin, A. Scott A1 - Ganapathy, Anuradha A1 - Gwinn-Giglio, Michelle A1 - Han, Cliff S. A1 - Khouri, Hoda A1 - Kiss, Hajnalka A1 - Kothari, Sagar P. A1 - Madupu, Ramana A1 - Nelson, Karen E. A1 - Nelson, William C. A1 - Paulsen, Ian A1 - Penn, Kevin A1 - Ren, Qinghu A1 - Rosovitz, M. J. A1 - J. Selengut A1 - Shrivastava, Susmita A1 - Sullivan, Steven A. A1 - Tapia, Roxanne A1 - Thompson, L. Sue A1 - Watkins, Kisha L. A1 - Yang, Qi A1 - Yu, Chunhui A1 - Zafar, Nikhat A1 - Zhou, Liwei A1 - Kuske, Cheryl R. KW - Anti-Bacterial Agents KW - bacteria KW - Biological Transport KW - Carbohydrate Metabolism KW - Cyanobacteria KW - DNA, Bacterial KW - Fungi KW - Genome, Bacterial KW - Macrolides KW - Molecular Sequence Data KW - Nitrogen KW - Phylogeny KW - Proteobacteria KW - Sequence Analysis, DNA KW - Sequence Homology KW - Soil Microbiology AB - The complete genomes of three strains from the phylum Acidobacteria were compared. Phylogenetic analysis placed them as a unique phylum. They share genomic traits with members of the Proteobacteria, the Cyanobacteria, and the Fungi. The three strains appear to be versatile heterotrophs. Genomic and culture traits indicate the use of carbon sources that span simple sugars to more complex substrates such as hemicellulose, cellulose, and chitin. The genomes encode low-specificity major facilitator superfamily transporters and high-affinity ABC transporters for sugars, suggesting that they are best suited to low-nutrient conditions. They appear capable of nitrate and nitrite reduction but not N(2) fixation or denitrification. The genomes contained numerous genes that encode siderophore receptors, but no evidence of siderophore production was found, suggesting that they may obtain iron via interaction with other microorganisms. The presence of cellulose synthesis genes and a large class of novel high-molecular-weight excreted proteins suggests potential traits for desiccation resistance, biofilm formation, and/or contribution to soil structure. Polyketide synthase and macrolide glycosylation genes suggest the production of novel antimicrobial compounds. Genes that encode a variety of novel proteins were also identified. The abundance of acidobacteria in soils worldwide and the breadth of potential carbon use by the sequenced strains suggest significant and previously unrecognized contributions to the terrestrial carbon cycle. Combining our genomic evidence with available culture traits, we postulate that cells of these isolates are long-lived, divide slowly, exhibit slow metabolic rates under low-nutrient conditions, and are well equipped to tolerate fluctuations in soil hydration. VL - 75 N1 - http://www.ncbi.nlm.nih.gov/pubmed/19201974?dopt=Abstract ER - TY - JOUR T1 - Three genomes from the phylum Acidobacteria provide insight into the lifestyles of these microorganisms in soils. JF - Appl Environ Microbiol Y1 - 2009 A1 - Ward, Naomi L A1 - Challacombe, Jean F A1 - Janssen, Peter H A1 - Henrissat, Bernard A1 - Coutinho, Pedro M A1 - Wu, Martin A1 - Xie, Gary A1 - Haft, Daniel H A1 - Sait, Michelle A1 - Badger, Jonathan A1 - Barabote, Ravi D A1 - Bradley, Brent A1 - Brettin, Thomas S A1 - Brinkac, Lauren M A1 - Bruce, David A1 - Creasy, Todd A1 - Daugherty, Sean C A1 - Davidsen, Tanja M A1 - DeBoy, Robert T A1 - Detter, J Chris A1 - Dodson, Robert J A1 - Durkin, A Scott A1 - Ganapathy, Anuradha A1 - Gwinn-Giglio, Michelle A1 - Han, Cliff S A1 - Khouri, Hoda A1 - Kiss, Hajnalka A1 - Kothari, Sagar P A1 - Madupu, Ramana A1 - Nelson, Karen E A1 - Nelson, William C A1 - Paulsen, Ian A1 - Penn, Kevin A1 - Ren, Qinghu A1 - Rosovitz, M J A1 - Selengut, Jeremy D A1 - Shrivastava, Susmita A1 - Sullivan, Steven A A1 - Tapia, Roxanne A1 - Thompson, L Sue A1 - Watkins, Kisha L A1 - Yang, Qi A1 - Yu, Chunhui A1 - Zafar, Nikhat A1 - Zhou, Liwei A1 - Kuske, Cheryl R KW - Anti-Bacterial Agents KW - bacteria KW - Biological Transport KW - Carbohydrate Metabolism KW - Cyanobacteria KW - DNA, Bacterial KW - Fungi KW - Genome, Bacterial KW - Macrolides KW - Molecular Sequence Data KW - Nitrogen KW - Phylogeny KW - Proteobacteria KW - Sequence Analysis, DNA KW - Sequence Homology KW - Soil Microbiology AB -

The complete genomes of three strains from the phylum Acidobacteria were compared. Phylogenetic analysis placed them as a unique phylum. They share genomic traits with members of the Proteobacteria, the Cyanobacteria, and the Fungi. The three strains appear to be versatile heterotrophs. Genomic and culture traits indicate the use of carbon sources that span simple sugars to more complex substrates such as hemicellulose, cellulose, and chitin. The genomes encode low-specificity major facilitator superfamily transporters and high-affinity ABC transporters for sugars, suggesting that they are best suited to low-nutrient conditions. They appear capable of nitrate and nitrite reduction but not N(2) fixation or denitrification. The genomes contained numerous genes that encode siderophore receptors, but no evidence of siderophore production was found, suggesting that they may obtain iron via interaction with other microorganisms. The presence of cellulose synthesis genes and a large class of novel high-molecular-weight excreted proteins suggests potential traits for desiccation resistance, biofilm formation, and/or contribution to soil structure. Polyketide synthase and macrolide glycosylation genes suggest the production of novel antimicrobial compounds. Genes that encode a variety of novel proteins were also identified. The abundance of acidobacteria in soils worldwide and the breadth of potential carbon use by the sequenced strains suggest significant and previously unrecognized contributions to the terrestrial carbon cycle. Combining our genomic evidence with available culture traits, we postulate that cells of these isolates are long-lived, divide slowly, exhibit slow metabolic rates under low-nutrient conditions, and are well equipped to tolerate fluctuations in soil hydration.

VL - 75 CP - 7 M3 - 10.1128/AEM.02294-08 ER - TY - JOUR T1 - A GENEALOGICAL APPROACH TO QUANTIFYING LINEAGE DIVERGENCE JF - EvolutionEvolution Y1 - 2008 A1 - Michael P. Cummings A1 - Neel, Maile C. A1 - Shaw, Kerry L. KW - Ancestral polymorphism KW - congruence KW - exclusivity KW - genealogy KW - lineage sorting KW - monophyly KW - paraphyly KW - Phylogeny KW - polyphyly KW - speciation KW - species AB - We introduce a statistic, the genealogical sorting index (gsi), for quantifying the degree of exclusive ancestry of labeled groups on a rooted genealogy and demonstrate its application. The statistic is simple, intuitive, and easily calculated. It has a normalized range to facilitate comparisons among different groups, trees, or studies and it provides information on individual groups rather than a composite measure for all groups. It naturally handles polytomies and accommodates measures of uncertainty in phylogenetic relationships. We use coalescent simulations to explore the behavior of the gsi across a range of divergence times, with the mean value increasing to 1, the maximum value when exclusivity within a group reached monophyly. Simulations also demonstrate that the power to reject the null hypothesis of mixed genealogical ancestry increased markedly as sample size increased, and that the gsi provides a statistically more powerful measure of divergence than FST. Applications to data from published studies demonstrated that the gsi provides a useful way to detect significant exclusivity even when groups are not monophyletic. Although we describe this statistic in the context of divergence, it is more broadly applicable to quantify and assess the significance of clustering of observations in labeled groups on any tree. VL - 62 SN - 1558-5646 ER - TY - Generic T1 - Uncovering Genomic Reassortments among Influenza Strains by Enumerating Maximal Bicliques T2 - IEEE International Conference on Bioinformatics and Biomedicine, 2008. BIBM '08 Y1 - 2008 A1 - Nagarajan, N. A1 - Kingsford, Carl KW - avian hosted influenza genome KW - Bioinformatics KW - Capacitive sensors KW - Delay KW - diseases KW - Event detection KW - general bipartite graphs KW - genomic reassortments KW - Genomics KW - graph theory KW - high probability inconsistencies KW - History KW - human hosted influenza genome KW - incompatibility graph KW - Influenza KW - influenza strain KW - maximal biclique KW - maximal biclique enumeration KW - microorganisms KW - phylogenetic trees KW - Phylogeny KW - Public healthcare KW - quadratic delay algorithm KW - reassortment KW - reassortment event detection KW - Tree graphs KW - viral genome evolutionary history KW - virulence AB - The evolutionary histories of viral genomes have received significant recent attention due to their importance in understanding virulence and the corresponding ramifications to public health. We present a novel framework to detect reassortment events in influenza based on the comparison of two distributions of phylogenetic trees, rather than a pair of, possibly unreliable, consensus trees. We show how to detect all high-probability inconsistencies between two distributions of trees by enumerating maximal bicliques within a defined incompatibility graph. In the process, we give the first quadratic delay algorithm for enumerating maximal bicliques within general bipartite graphs. We demonstrate the utility of our approach by applying it to several sets of influenza genomes (both human- and avian-hosted) and successfully identify all known reassortment events and a few novel candidate reassortments. In addition, on simulated datasets, our approach correctly finds implanted reassortments and rarely detects reassortments where none were introduced. JA - IEEE International Conference on Bioinformatics and Biomedicine, 2008. BIBM '08 PB - IEEE SN - 978-0-7695-3452-7 ER - TY - JOUR T1 - Evolution of genes and genomes on the Drosophila phylogeny. JF - Nature Y1 - 2007 A1 - Clark, Andrew G A1 - Eisen, Michael B A1 - Smith, Douglas R A1 - Bergman, Casey M A1 - Oliver, Brian A1 - Markow, Therese A A1 - Kaufman, Thomas C A1 - Kellis, Manolis A1 - Gelbart, William A1 - Iyer, Venky N A1 - Pollard, Daniel A A1 - Sackton, Timothy B A1 - Larracuente, Amanda M A1 - Singh, Nadia D A1 - Abad, Jose P A1 - Abt, Dawn N A1 - Adryan, Boris A1 - Aguade, Montserrat A1 - Akashi, Hiroshi A1 - Anderson, Wyatt W A1 - Aquadro, Charles F A1 - Ardell, David H A1 - Arguello, Roman A1 - Artieri, Carlo G A1 - Barbash, Daniel A A1 - Barker, Daniel A1 - Barsanti, Paolo A1 - Batterham, Phil A1 - Batzoglou, Serafim A1 - Begun, Dave A1 - Bhutkar, Arjun A1 - Blanco, Enrico A1 - Bosak, Stephanie A A1 - Bradley, Robert K A1 - Brand, Adrianne D A1 - Brent, Michael R A1 - Brooks, Angela N A1 - Brown, Randall H A1 - Butlin, Roger K A1 - Caggese, Corrado A1 - Calvi, Brian R A1 - Bernardo de Carvalho, A A1 - Caspi, Anat A1 - Castrezana, Sergio A1 - Celniker, Susan E A1 - Chang, Jean L A1 - Chapple, Charles A1 - Chatterji, Sourav A1 - Chinwalla, Asif A1 - Civetta, Alberto A1 - Clifton, Sandra W A1 - Comeron, Josep M A1 - Costello, James C A1 - Coyne, Jerry A A1 - Daub, Jennifer A1 - David, Robert G A1 - Delcher, Arthur L A1 - Delehaunty, Kim A1 - Do, Chuong B A1 - Ebling, Heather A1 - Edwards, Kevin A1 - Eickbush, Thomas A1 - Evans, Jay D A1 - Filipski, Alan A1 - Findeiss, Sven A1 - Freyhult, Eva A1 - Fulton, Lucinda A1 - Fulton, Robert A1 - Garcia, Ana C L A1 - Gardiner, Anastasia A1 - Garfield, David A A1 - Garvin, Barry E A1 - Gibson, Greg A1 - Gilbert, Don A1 - Gnerre, Sante A1 - Godfrey, Jennifer A1 - Good, Robert A1 - Gotea, Valer A1 - Gravely, Brenton A1 - Greenberg, Anthony J A1 - Griffiths-Jones, Sam A1 - Gross, Samuel A1 - Guigo, Roderic A1 - Gustafson, Erik A A1 - Haerty, Wilfried A1 - Hahn, Matthew W A1 - Halligan, Daniel L A1 - Halpern, Aaron L A1 - Halter, Gillian M A1 - Han, Mira V A1 - Heger, Andreas A1 - Hillier, LaDeana A1 - Hinrichs, Angie S A1 - Holmes, Ian A1 - Hoskins, Roger A A1 - Hubisz, Melissa J A1 - Hultmark, Dan A1 - Huntley, Melanie A A1 - Jaffe, David B A1 - Jagadeeshan, Santosh A1 - Jeck, William R A1 - Johnson, Justin A1 - Jones, Corbin D A1 - Jordan, William C A1 - Karpen, Gary H A1 - Kataoka, Eiko A1 - Keightley, Peter D A1 - Kheradpour, Pouya A1 - Kirkness, Ewen F A1 - Koerich, Leonardo B A1 - Kristiansen, Karsten A1 - Kudrna, Dave A1 - Kulathinal, Rob J A1 - Kumar, Sudhir A1 - Kwok, Roberta A1 - Lander, Eric A1 - Langley, Charles H A1 - Lapoint, Richard A1 - Lazzaro, Brian P A1 - Lee, So-Jeong A1 - Levesque, Lisa A1 - Li, Ruiqiang A1 - Lin, Chiao-Feng A1 - Lin, Michael F A1 - Lindblad-Toh, Kerstin A1 - Llopart, Ana A1 - Long, Manyuan A1 - Low, Lloyd A1 - Lozovsky, Elena A1 - Lu, Jian A1 - Luo, Meizhong A1 - Machado, Carlos A A1 - Makalowski, Wojciech A1 - Marzo, Mar A1 - Matsuda, Muneo A1 - Matzkin, Luciano A1 - McAllister, Bryant A1 - McBride, Carolyn S A1 - McKernan, Brendan A1 - McKernan, Kevin A1 - Mendez-Lago, Maria A1 - Minx, Patrick A1 - Mollenhauer, Michael U A1 - Montooth, Kristi A1 - Mount, Stephen M A1 - Mu, Xu A1 - Myers, Eugene A1 - Negre, Barbara A1 - Newfeld, Stuart A1 - Nielsen, Rasmus A1 - Noor, Mohamed A F A1 - O'Grady, Patrick A1 - Pachter, Lior A1 - Papaceit, Montserrat A1 - Parisi, Matthew J A1 - Parisi, Michael A1 - Parts, Leopold A1 - Pedersen, Jakob S A1 - Pesole, Graziano A1 - Phillippy, Adam M A1 - Ponting, Chris P A1 - Pop, Mihai A1 - Porcelli, Damiano A1 - Powell, Jeffrey R A1 - Prohaska, Sonja A1 - Pruitt, Kim A1 - Puig, Marta A1 - Quesneville, Hadi A1 - Ram, Kristipati Ravi A1 - Rand, David A1 - Rasmussen, Matthew D A1 - Reed, Laura K A1 - Reenan, Robert A1 - Reily, Amy A1 - Remington, Karin A A1 - Rieger, Tania T A1 - Ritchie, Michael G A1 - Robin, Charles A1 - Rogers, Yu-Hui A1 - Rohde, Claudia A1 - Rozas, Julio A1 - Rubenfield, Marc J A1 - Ruiz, Alfredo A1 - Russo, Susan A1 - Salzberg, Steven L A1 - Sanchez-Gracia, Alejandro A1 - Saranga, David J A1 - Sato, Hajime A1 - Schaeffer, Stephen W A1 - Schatz, Michael C A1 - Schlenke, Todd A1 - Schwartz, Russell A1 - Segarra, Carmen A1 - Singh, Rama S A1 - Sirot, Laura A1 - Sirota, Marina A1 - Sisneros, Nicholas B A1 - Smith, Chris D A1 - Smith, Temple F A1 - Spieth, John A1 - Stage, Deborah E A1 - Stark, Alexander A1 - Stephan, Wolfgang A1 - Strausberg, Robert L A1 - Strempel, Sebastian A1 - Sturgill, David A1 - Sutton, Granger A1 - Sutton, Granger G A1 - Tao, Wei A1 - Teichmann, Sarah A1 - Tobari, Yoshiko N A1 - Tomimura, Yoshihiko A1 - Tsolas, Jason M A1 - Valente, Vera L S A1 - Venter, Eli A1 - Venter, J Craig A1 - Vicario, Saverio A1 - Vieira, Filipe G A1 - Vilella, Albert J A1 - Villasante, Alfredo A1 - Walenz, Brian A1 - Wang, Jun A1 - Wasserman, Marvin A1 - Watts, Thomas A1 - Wilson, Derek A1 - Wilson, Richard K A1 - Wing, Rod A A1 - Wolfner, Mariana F A1 - Wong, Alex A1 - Wong, Gane Ka-Shu A1 - Wu, Chung-I A1 - Wu, Gabriel A1 - Yamamoto, Daisuke A1 - Yang, Hsiao-Pei A1 - Yang, Shiaw-Pyng A1 - Yorke, James A A1 - Yoshida, Kiyohito A1 - Zdobnov, Evgeny A1 - Zhang, Peili A1 - Zhang, Yu A1 - Zimin, Aleksey V A1 - Baldwin, Jennifer A1 - Abdouelleil, Amr A1 - Abdulkadir, Jamal A1 - Abebe, Adal A1 - Abera, Brikti A1 - Abreu, Justin A1 - Acer, St Christophe A1 - Aftuck, Lynne A1 - Alexander, Allen A1 - An, Peter A1 - Anderson, Erica A1 - Anderson, Scott A1 - Arachi, Harindra A1 - Azer, Marc A1 - Bachantsang, Pasang A1 - Barry, Andrew A1 - Bayul, Tashi A1 - Berlin, Aaron A1 - Bessette, Daniel A1 - Bloom, Toby A1 - Blye, Jason A1 - Boguslavskiy, Leonid A1 - Bonnet, Claude A1 - Boukhgalter, Boris A1 - Bourzgui, Imane A1 - Brown, Adam A1 - Cahill, Patrick A1 - Channer, Sheridon A1 - Cheshatsang, Yama A1 - Chuda, Lisa A1 - Citroen, Mieke A1 - Collymore, Alville A1 - Cooke, Patrick A1 - Costello, Maura A1 - D'Aco, Katie A1 - Daza, Riza A1 - De Haan, Georgius A1 - DeGray, Stuart A1 - DeMaso, Christina A1 - Dhargay, Norbu A1 - Dooley, Kimberly A1 - Dooley, Erin A1 - Doricent, Missole A1 - Dorje, Passang A1 - Dorjee, Kunsang A1 - Dupes, Alan A1 - Elong, Richard A1 - Falk, Jill A1 - Farina, Abderrahim A1 - Faro, Susan A1 - Ferguson, Diallo A1 - Fisher, Sheila A1 - Foley, Chelsea D A1 - Franke, Alicia A1 - Friedrich, Dennis A1 - Gadbois, Loryn A1 - Gearin, Gary A1 - Gearin, Christina R A1 - Giannoukos, Georgia A1 - Goode, Tina A1 - Graham, Joseph A1 - Grandbois, Edward A1 - Grewal, Sharleen A1 - Gyaltsen, Kunsang A1 - Hafez, Nabil A1 - Hagos, Birhane A1 - Hall, Jennifer A1 - Henson, Charlotte A1 - Hollinger, Andrew A1 - Honan, Tracey A1 - Huard, Monika D A1 - Hughes, Leanne A1 - Hurhula, Brian A1 - Husby, M Erii A1 - Kamat, Asha A1 - Kanga, Ben A1 - Kashin, Seva A1 - Khazanovich, Dmitry A1 - Kisner, Peter A1 - Lance, Krista A1 - Lara, Marcia A1 - Lee, William A1 - Lennon, Niall A1 - Letendre, Frances A1 - LeVine, Rosie A1 - Lipovsky, Alex A1 - Liu, Xiaohong A1 - Liu, Jinlei A1 - Liu, Shangtao A1 - Lokyitsang, Tashi A1 - Lokyitsang, Yeshi A1 - Lubonja, Rakela A1 - Lui, Annie A1 - MacDonald, Pen A1 - Magnisalis, Vasilia A1 - Maru, Kebede A1 - Matthews, Charles A1 - McCusker, William A1 - McDonough, Susan A1 - Mehta, Teena A1 - Meldrim, James A1 - Meneus, Louis A1 - Mihai, Oana A1 - Mihalev, Atanas A1 - Mihova, Tanya A1 - Mittelman, Rachel A1 - Mlenga, Valentine A1 - Montmayeur, Anna A1 - Mulrain, Leonidas A1 - Navidi, Adam A1 - Naylor, Jerome A1 - Negash, Tamrat A1 - Nguyen, Thu A1 - Nguyen, Nga A1 - Nicol, Robert A1 - Norbu, Choe A1 - Norbu, Nyima A1 - Novod, Nathaniel A1 - O'Neill, Barry A1 - Osman, Sahal A1 - Markiewicz, Eva A1 - Oyono, Otero L A1 - Patti, Christopher A1 - Phunkhang, Pema A1 - Pierre, Fritz A1 - Priest, Margaret A1 - Raghuraman, Sujaa A1 - Rege, Filip A1 - Reyes, Rebecca A1 - Rise, Cecil A1 - Rogov, Peter A1 - Ross, Keenan A1 - Ryan, Elizabeth A1 - Settipalli, Sampath A1 - Shea, Terry A1 - Sherpa, Ngawang A1 - Shi, Lu A1 - Shih, Diana A1 - Sparrow, Todd A1 - Spaulding, Jessica A1 - Stalker, John A1 - Stange-Thomann, Nicole A1 - Stavropoulos, Sharon A1 - Stone, Catherine A1 - Strader, Christopher A1 - Tesfaye, Senait A1 - Thomson, Talene A1 - Thoulutsang, Yama A1 - Thoulutsang, Dawa A1 - Topham, Kerri A1 - Topping, Ira A1 - Tsamla, Tsamla A1 - Vassiliev, Helen A1 - Vo, Andy A1 - Wangchuk, Tsering A1 - Wangdi, Tsering A1 - Weiand, Michael A1 - Wilkinson, Jane A1 - Wilson, Adam A1 - Yadav, Shailendra A1 - Young, Geneva A1 - Yu, Qing A1 - Zembek, Lisa A1 - Zhong, Danni A1 - Zimmer, Andrew A1 - Zwirko, Zac A1 - Jaffe, David B A1 - Alvarez, Pablo A1 - Brockman, Will A1 - Butler, Jonathan A1 - Chin, CheeWhye A1 - Gnerre, Sante A1 - Grabherr, Manfred A1 - Kleber, Michael A1 - Mauceli, Evan A1 - MacCallum, Iain KW - Animals KW - Codon KW - DNA Transposable Elements KW - Drosophila KW - Drosophila Proteins KW - Evolution, Molecular KW - Gene Order KW - Genes, Insect KW - Genome, Insect KW - Genome, Mitochondrial KW - Genomics KW - Immunity KW - Multigene Family KW - Phylogeny KW - Reproduction KW - RNA, Untranslated KW - sequence alignment KW - Sequence Analysis, DNA KW - Synteny AB -

Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

VL - 450 CP - 7167 M3 - 10.1038/nature06341 ER - TY - JOUR T1 - Spliceosomal small nuclear RNA genes in 11 insect genomes. JF - RNA Y1 - 2007 A1 - Mount, Stephen M A1 - Gotea, Valer A1 - Lin, Chiao-Feng A1 - Hernandez, Kristina A1 - Makalowski, Wojciech KW - Animals KW - Base Sequence KW - Bees KW - Computational Biology KW - Diptera KW - Evolution, Molecular KW - Genes, Insect KW - Genome, Insect KW - Molecular Sequence Data KW - Nucleic Acid Conformation KW - Phylogeny KW - Promoter Regions, Genetic KW - RNA Splicing KW - RNA, Small Nuclear KW - Sequence Analysis, RNA KW - Spliceosomes AB -

The removal of introns from the primary transcripts of protein-coding genes is accomplished by the spliceosome, a large macromolecular complex of which small nuclear RNAs (snRNAs) are crucial components. Following the recent sequencing of the honeybee (Apis mellifera) genome, we used various computational methods, ranging from sequence similarity search to RNA secondary structure prediction, to search for putative snRNA genes (including their promoters) and to examine their pattern of conservation among 11 available insect genomes (A. mellifera, Tribolium castaneum, Bombyx mori, Anopheles gambiae, Aedes aegypti, and six Drosophila species). We identified candidates for all nine spliceosomal snRNA genes in all the analyzed genomes. All the species contain a similar number of snRNA genes, with the exception of A. aegypti, whose genome contains more U1, U2, and U5 genes, and A. mellifera, whose genome contains fewer U2 and U5 genes. We found that snRNA genes are generally more closely related to homologs within the same genus than to those in other genera. Promoter regions for all spliceosomal snRNA genes within each insect species share similar sequence motifs that are likely to correspond to the PSEA (proximal sequence element A), the binding site for snRNA activating protein complex, but these promoter elements vary in sequence among the five insect families surveyed here. In contrast to the other insect species investigated, Dipteran genomes are characterized by a rapid evolution (or loss) of components of the U12 spliceosome and a striking loss of U12-type introns.

VL - 13 CP - 1 M3 - 10.1261/rna.259207 ER - TY - JOUR T1 - TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes JF - Nucleic acids researchNucleic Acids Research Y1 - 2007 A1 - J. Selengut A1 - Haft, Daniel H. A1 - Davidsen, Tanja A1 - Ganapathy, Anurhada A1 - Gwinn-Giglio, Michelle A1 - Nelson, William C. A1 - Richter, R. Alexander A1 - White, Owen KW - Archaeal Proteins KW - Bacterial Proteins KW - Databases, Protein KW - Genome, Bacterial KW - Genomics KW - Internet KW - Phylogeny KW - software KW - User-Computer Interface AB - TIGRFAMs is a collection of protein family definitions built to aid in high-throughput annotation of specific protein functions. Each family is based on a hidden Markov model (HMM), where both cutoff scores and membership in the seed alignment are chosen so that the HMMs can classify numerous proteins according to their specific molecular functions. Most TIGRFAMs models describe 'equivalog' families, where both orthology and lateral gene transfer may be part of the evolutionary history, but where a single molecular function has been conserved. The Genome Properties system contains a queriable set of metabolic reconstructions, genome metrics and extractions of information from the scientific literature. Its genome-by-genome assertions of whether or not specific structures, pathways or systems are present provide high-level conceptual descriptions of genomic content. These assertions enable comparative genomics, provide a meaningful biological context to aid in manual annotation, support assignments of Gene Ontology (GO) biological process terms and help validate HMM-based predictions of protein function. The Genome Properties system is particularly useful as a generator of phylogenetic profiles, through which new protein family functions may be discovered. The TIGRFAMs and Genome Properties systems can be accessed at http://www.tigr.org/TIGRFAMs and http://www.tigr.org/Genome_Properties. VL - 35 N1 - http://www.ncbi.nlm.nih.gov/pubmed/17151080?dopt=Abstract ER - TY - JOUR T1 - Comparative genomics of emerging human ehrlichiosis agents JF - PLoS geneticsPLoS genetics Y1 - 2006 A1 - Dunning Hotopp, Julie C. A1 - Lin, Mingqun A1 - Madupu, Ramana A1 - Crabtree, Jonathan A1 - Angiuoli, Samuel V. A1 - Eisen, Jonathan A. A1 - Eisen, Jonathan A1 - Seshadri, Rekha A1 - Ren, Qinghu A1 - Wu, Martin A1 - Utterback, Teresa R. A1 - Smith, Shannon A1 - Lewis, Matthew A1 - Khouri, Hoda A1 - Zhang, Chunbin A1 - Niu, Hua A1 - Lin, Quan A1 - Ohashi, Norio A1 - Zhi, Ning A1 - Nelson, William A1 - Brinkac, Lauren M. A1 - Dodson, Robert J. A1 - Rosovitz, M. J. A1 - Sundaram, Jaideep A1 - Daugherty, Sean C. A1 - Davidsen, Tanja A1 - Durkin, Anthony S. A1 - Gwinn, Michelle A1 - Haft, Daniel H. A1 - J. Selengut A1 - Sullivan, Steven A. A1 - Zafar, Nikhat A1 - Zhou, Liwei A1 - Benahmed, Faiza A1 - Forberger, Heather A1 - Halpin, Rebecca A1 - Mulligan, Stephanie A1 - Robinson, Jeffrey A1 - White, Owen A1 - Rikihisa, Yasuko A1 - Tettelin, Hervé KW - Animals KW - Biotin KW - DNA Repair KW - Ehrlichia KW - Ehrlichiosis KW - Genome KW - Genomics KW - HUMANS KW - Models, Biological KW - Phylogeny KW - Rickettsia KW - Ticks AB - Anaplasma (formerly Ehrlichia) phagocytophilum, Ehrlichia chaffeensis, and Neorickettsia (formerly Ehrlichia) sennetsu are intracellular vector-borne pathogens that cause human ehrlichiosis, an emerging infectious disease. We present the complete genome sequences of these organisms along with comparisons to other organisms in the Rickettsiales order. Ehrlichia spp. and Anaplasma spp. display a unique large expansion of immunodominant outer membrane proteins facilitating antigenic variation. All Rickettsiales have a diminished ability to synthesize amino acids compared to their closest free-living relatives. Unlike members of the Rickettsiaceae family, these pathogenic Anaplasmataceae are capable of making all major vitamins, cofactors, and nucleotides, which could confer a beneficial role in the invertebrate vector or the vertebrate host. Further analysis identified proteins potentially involved in vacuole confinement of the Anaplasmataceae, a life cycle involving a hematophagous vector, vertebrate pathogenesis, human pathogenesis, and lack of transovarial transmission. These discoveries provide significant insights into the biology of these obligate intracellular pathogens. VL - 2 N1 - http://www.ncbi.nlm.nih.gov/pubmed/16482227?dopt=Abstract ER - TY - JOUR T1 - Exopolysaccharide-associated protein sorting in environmental organisms: the PEP-CTERM/EpsH system. Application of a novel phylogenetic profiling heuristic JF - BMC biologyBMC biology Y1 - 2006 A1 - Haft, Daniel H. A1 - Paulsen, Ian T. A1 - Ward, Naomi A1 - J. Selengut KW - Amino Acid Motifs KW - Amino Acid Sequence KW - bacteria KW - Bacterial Proteins KW - Biofilms KW - Genome, Bacterial KW - Markov chains KW - Molecular Sequence Data KW - Phylogeny KW - Polysaccharides, Bacterial KW - Protein Sorting Signals KW - Protein Transport KW - Seawater KW - sequence alignment KW - Soil Microbiology AB - BACKGROUND: Protein translocation to the proper cellular destination may be guided by various classes of sorting signals recognizable in the primary sequence. Detection in some genomes, but not others, may reveal sorting system components by comparison of the phylogenetic profile of the class of sorting signal to that of various protein families. RESULTS: We describe a short C-terminal homology domain, sporadically distributed in bacteria, with several key characteristics of protein sorting signals. The domain includes a near-invariant motif Pro-Glu-Pro (PEP). This possible recognition or processing site is followed by a predicted transmembrane helix and a cluster rich in basic amino acids. We designate this domain PEP-CTERM. It tends to occur multiple times in a genome if it occurs at all, with a median count of eight instances; Verrucomicrobium spinosum has sixty-five. PEP-CTERM-containing proteins generally contain an N-terminal signal peptide and exhibit high diversity and little homology to known proteins. All bacteria with PEP-CTERM have both an outer membrane and exopolysaccharide (EPS) production genes. By a simple heuristic for screening phylogenetic profiles in the absence of pre-formed protein families, we discovered that a homolog of the membrane protein EpsH (exopolysaccharide locus protein H) occurs in a species when PEP-CTERM domains are found. The EpsH family contains invariant residues consistent with a transpeptidase function. Most PEP-CTERM proteins are encoded by single-gene operons preceded by large intergenic regions. In the Proteobacteria, most of these upstream regions share a DNA sequence, a probable cis-regulatory site that contains a sigma-54 binding motif. The phylogenetic profile for this DNA sequence exactly matches that of three proteins: a sigma-54-interacting response regulator (PrsR), a transmembrane histidine kinase (PrsK), and a TPR protein (PrsT). CONCLUSION: These findings are consistent with the hypothesis that PEP-CTERM and EpsH form a protein export sorting system, analogous to the LPXTG/sortase system of Gram-positive bacteria, and correlated to EPS expression. It occurs preferentially in bacteria from sediments, soils, and biofilms. The novel method that led to these findings, partial phylogenetic profiling, requires neither global sequence clustering nor arbitrary similarity cutoffs and appears to be a rapid, effective alternative to other profiling methods. VL - 4 N1 - http://www.ncbi.nlm.nih.gov/pubmed/16930487?dopt=Abstract ER - TY - JOUR T1 - Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome" JF - Proceedings of the National Academy of Sciences of the United States of AmericaProceedings of the National Academy of Sciences of the United States of America Y1 - 2005 A1 - Tettelin, Hervé A1 - Masignani, Vega A1 - Cieslewicz, Michael J. A1 - Donati, Claudio A1 - Medini, Duccio A1 - Ward, Naomi L. A1 - Angiuoli, Samuel V. A1 - Crabtree, Jonathan A1 - Jones, Amanda L. A1 - Durkin, A. Scott A1 - DeBoy, Robert T. A1 - Davidsen, Tanja M. A1 - Mora, Marirosa A1 - Scarselli, Maria A1 - Margarit y Ros, Immaculada A1 - Peterson, Jeremy D. A1 - Hauser, Christopher R. A1 - Sundaram, Jaideep P. A1 - Nelson, William C. A1 - Madupu, Ramana A1 - Brinkac, Lauren M. A1 - Dodson, Robert J. A1 - Rosovitz, Mary J. A1 - Sullivan, Steven A. A1 - Daugherty, Sean C. A1 - Haft, Daniel H. A1 - J. Selengut A1 - Gwinn, Michelle L. A1 - Zhou, Liwei A1 - Zafar, Nikhat A1 - Khouri, Hoda A1 - Radune, Diana A1 - Dimitrov, George A1 - Watkins, Kisha A1 - O'Connor, Kevin J. B. A1 - Smith, Shannon A1 - Utterback, Teresa R. A1 - White, Owen A1 - Rubens, Craig E. A1 - Grandi, Guido A1 - Madoff, Lawrence C. A1 - Kasper, Dennis L. A1 - Telford, John L. A1 - Wessels, Michael R. A1 - Rappuoli, Rino A1 - Fraser, Claire M. KW - Amino Acid Sequence KW - Bacterial Capsules KW - Base Sequence KW - Gene expression KW - Genes, Bacterial KW - Genetic Variation KW - Genome, Bacterial KW - Molecular Sequence Data KW - Phylogeny KW - sequence alignment KW - Sequence Analysis, DNA KW - Streptococcus agalactiae KW - virulence AB - The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes. VL - 102 N1 - http://www.ncbi.nlm.nih.gov/pubmed/16172379?dopt=Abstract ER - TY - JOUR T1 - A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes JF - PLoS computational biologyPLOS Computational Biology Y1 - 2005 A1 - Haft, Daniel H. A1 - J. Selengut A1 - Mongodin, Emmanuel F. A1 - Nelson, Karen E. KW - Genes, Archaeal KW - Genes, Bacterial KW - Genes, Fungal KW - Genome KW - Genome, Bacterial KW - Haloarcula marismortui KW - Markov chains KW - Multigene Family KW - Oligonucleotide Array Sequence Analysis KW - Phylogeny KW - Prokaryotic Cells KW - Proteins KW - Repetitive Sequences, Nucleic Acid KW - Yersinia pestis AB - Clustered regularly interspaced short palindromic repeats (CRISPRs) are a family of DNA direct repeats found in many prokaryotic genomes. Repeats of 21-37 bp typically show weak dyad symmetry and are separated by regularly sized, nonrepetitive spacer sequences. Four CRISPR-associated (Cas) protein families, designated Cas1 to Cas4, are strictly associated with CRISPR elements and always occur near a repeat cluster. Some spacers originate from mobile genetic elements and are thought to confer "immunity" against the elements that harbor these sequences. In the present study, we have systematically investigated uncharacterized proteins encoded in the vicinity of these CRISPRs and found many additional protein families that are strictly associated with CRISPR loci across multiple prokaryotic species. Multiple sequence alignments and hidden Markov models have been built for 45 Cas protein families. These models identify family members with high sensitivity and selectivity and classify key regulators of development, DevR and DevS, in Myxococcus xanthus as Cas proteins. These identifications show that CRISPR/cas gene regions can be quite large, with up to 20 different, tandem-arranged cas genes next to a repeat cluster or filling the region between two repeat clusters. Distinctive subsets of the collection of Cas proteins recur in phylogenetically distant species and correlate with characteristic repeat periodicity. The analyses presented here support initial proposals of mobility of these units, along with the likelihood that loci of different subtypes interact with one another as well as with host cell defensive, replicative, and regulatory systems. It is evident from this analysis that CRISPR/cas loci are larger, more complex, and more heterogeneous than previously appreciated. VL - 1 N1 - http://www.ncbi.nlm.nih.gov/pubmed/16292354?dopt=Abstract ER - TY - JOUR T1 - Genome sequence of Silicibacter pomeroyi reveals adaptations to the marine environment JF - NatureNature Y1 - 2004 A1 - Moran, Mary Ann A1 - Buchan, Alison A1 - González, José M. A1 - Heidelberg, John F. A1 - Whitman, William B. A1 - Kiene, Ronald P. A1 - Henriksen, James R. A1 - King, Gary M. A1 - Belas, Robert A1 - Fuqua, Clay A1 - Brinkac, Lauren A1 - Lewis, Matt A1 - Johri, Shivani A1 - Weaver, Bruce A1 - Pai, Grace A1 - Eisen, Jonathan A. A1 - Rahe, Elisha A1 - Sheldon, Wade M. A1 - Ye, Wenying A1 - Miller, Todd R. A1 - Carlton, Jane A1 - Rasko, David A. A1 - Paulsen, Ian T. A1 - Ren, Qinghu A1 - Daugherty, Sean C. A1 - DeBoy, Robert T. A1 - Dodson, Robert J. A1 - Durkin, A. Scott A1 - Madupu, Ramana A1 - Nelson, William C. A1 - Sullivan, Steven A. A1 - Rosovitz, M. J. A1 - Haft, Daniel H. A1 - J. Selengut A1 - Ward, Naomi KW - Adaptation, Physiological KW - Carrier Proteins KW - Genes, Bacterial KW - Genome, Bacterial KW - marine biology KW - Molecular Sequence Data KW - Oceans and Seas KW - Phylogeny KW - plankton KW - RNA, Ribosomal, 16S KW - Roseobacter KW - Seawater AB - Since the recognition of prokaryotes as essential components of the oceanic food web, bacterioplankton have been acknowledged as catalysts of most major biogeochemical processes in the sea. Studying heterotrophic bacterioplankton has been challenging, however, as most major clades have never been cultured or have only been grown to low densities in sea water. Here we describe the genome sequence of Silicibacter pomeroyi, a member of the marine Roseobacter clade (Fig. 1), the relatives of which comprise approximately 10-20% of coastal and oceanic mixed-layer bacterioplankton. This first genome sequence from any major heterotrophic clade consists of a chromosome (4,109,442 base pairs) and megaplasmid (491,611 base pairs). Genome analysis indicates that this organism relies upon a lithoheterotrophic strategy that uses inorganic compounds (carbon monoxide and sulphide) to supplement heterotrophy. Silicibacter pomeroyi also has genes advantageous for associations with plankton and suspended particles, including genes for uptake of algal-derived compounds, use of metabolites from reducing microzones, rapid growth and cell-density-dependent regulation. This bacterium has a physiology distinct from that of marine oligotrophs, adding a new strategy to the recognized repertoire for coping with a nutrient-poor ocean. VL - 432 N1 - http://www.ncbi.nlm.nih.gov/pubmed/15602564?dopt=Abstract ER - TY - JOUR T1 - Genome of Geobacter sulfurreducens: metal reduction in subsurface environments JF - Science (New York, N.Y.)Science (New York, N.Y.) Y1 - 2003 A1 - Methé, B. A. A1 - Nelson, K. E. A1 - Eisen, J. A. A1 - Paulsen, I. T. A1 - Nelson, W. A1 - Heidelberg, J. F. A1 - Wu, D. A1 - Wu, M. A1 - Ward, N. A1 - Beanan, M. J. A1 - Dodson, R. J. A1 - Madupu, R. A1 - Brinkac, L. M. A1 - Daugherty, S. C. A1 - DeBoy, R. T. A1 - Durkin, A. S. A1 - Gwinn, M. A1 - Kolonay, J. F. A1 - Sullivan, S. A. A1 - Haft, D. H. A1 - J. Selengut A1 - Davidsen, T. M. A1 - Zafar, N. A1 - White, O. A1 - Tran, B. A1 - Romero, C. A1 - Forberger, H. A. A1 - Weidman, J. A1 - Khouri, H. A1 - Feldblyum, T. V. A1 - Utterback, T. R. A1 - Van Aken, S. E. A1 - Lovley, D. R. A1 - Fraser, C. M. KW - Acetates KW - Acetyl Coenzyme A KW - Aerobiosis KW - Anaerobiosis KW - Bacterial Proteins KW - Carbon KW - Chemotaxis KW - Chromosomes, Bacterial KW - Cytochromes c KW - Electron Transport KW - Energy Metabolism KW - Genes, Bacterial KW - Genes, Regulator KW - Genome, Bacterial KW - Geobacter KW - Hydrogen KW - Metals KW - Movement KW - Open Reading Frames KW - Oxidation-Reduction KW - Phylogeny AB - The complete genome sequence of Geobacter sulfurreducens, a delta-proteobacterium, reveals unsuspected capabilities, including evidence of aerobic metabolism, one-carbon and complex carbon metabolism, motility, and chemotactic behavior. These characteristics, coupled with the possession of many two-component sensors and many c-type cytochromes, reveal an ability to create alternative, redundant, electron transport networks and offer insights into the process of metal ion reduction in subsurface environments. As well as playing roles in the global cycling of metals and carbon, this organism clearly has the potential for use in bioremediation of radioactive metals and in the generation of electricity. VL - 302 N1 - http://www.ncbi.nlm.nih.gov/pubmed/14671304?dopt=Abstract ER - TY - JOUR T1 - The TIGRFAMs database of protein families JF - Nucleic acids researchNucleic Acids Research Y1 - 2003 A1 - Haft, Daniel H. A1 - J. Selengut A1 - White, Owen KW - Animals KW - Databases, Protein KW - Markov chains KW - Mixed Function Oxygenases KW - Phylogeny KW - Proteins KW - Pyruvate Carboxylase KW - Sequence Homology, Amino Acid AB - TIGRFAMs is a collection of manually curated protein families consisting of hidden Markov models (HMMs), multiple sequence alignments, commentary, Gene Ontology (GO) assignments, literature references and pointers to related TIGRFAMs, Pfam and InterPro models. These models are designed to support both automated and manually curated annotation of genomes. TIGRFAMs contains models of full-length proteins and shorter regions at the levels of superfamilies, subfamilies and equivalogs, where equivalogs are sets of homologous proteins conserved with respect to function since their last common ancestor. The scope of each model is set by raising or lowering cutoff scores and choosing members of the seed alignment to group proteins sharing specific function (equivalog) or more general properties. The overall goal is to provide information with maximum utility for the annotation process. TIGRFAMs is thus complementary to Pfam, whose models typically achieve broad coverage across distant homologs but end at the boundaries of conserved structural domains. The database currently contains over 1600 protein families. TIGRFAMs is available for searching or downloading at www.tigr.org/TIGRFAMs. VL - 31 N1 - http://www.ncbi.nlm.nih.gov/pubmed/12520025?dopt=Abstract ER - TY - JOUR T1 - The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. JF - Science Y1 - 2002 A1 - Dehal, Paramvir A1 - Satou, Yutaka A1 - Campbell, Robert K A1 - Chapman, Jarrod A1 - Degnan, Bernard A1 - De Tomaso, Anthony A1 - Davidson, Brad A1 - Di Gregorio, Anna A1 - Gelpke, Maarten A1 - Goodstein, David M A1 - Harafuji, Naoe A1 - Hastings, Kenneth E M A1 - Ho, Isaac A1 - Hotta, Kohji A1 - Huang, Wayne A1 - Kawashima, Takeshi A1 - Lemaire, Patrick A1 - Martinez, Diego A1 - Meinertzhagen, Ian A A1 - Necula, Simona A1 - Nonaka, Masaru A1 - Putnam, Nik A1 - Rash, Sam A1 - Saiga, Hidetoshi A1 - Satake, Masanobu A1 - Terry, Astrid A1 - Yamada, Lixy A1 - Wang, Hong-Gang A1 - Awazu, Satoko A1 - Azumi, Kaoru A1 - Boore, Jeffrey A1 - Branno, Margherita A1 - Chin-Bow, Stephen A1 - DeSantis, Rosaria A1 - Doyle, Sharon A1 - Francino, Pilar A1 - Keys, David N A1 - Haga, Shinobu A1 - Hayashi, Hiroko A1 - Hino, Kyosuke A1 - Imai, Kaoru S A1 - Inaba, Kazuo A1 - Kano, Shungo A1 - Kobayashi, Kenji A1 - Kobayashi, Mari A1 - Lee, Byung-In A1 - Makabe, Kazuhiro W A1 - Manohar, Chitra A1 - Matassi, Giorgio A1 - Medina, Monica A1 - Mochizuki, Yasuaki A1 - Mount, Steve A1 - Morishita, Tomomi A1 - Miura, Sachiko A1 - Nakayama, Akie A1 - Nishizaka, Satoko A1 - Nomoto, Hisayo A1 - Ohta, Fumiko A1 - Oishi, Kazuko A1 - Rigoutsos, Isidore A1 - Sano, Masako A1 - Sasaki, Akane A1 - Sasakura, Yasunori A1 - Shoguchi, Eiichi A1 - Shin-i, Tadasu A1 - Spagnuolo, Antoinetta A1 - Stainier, Didier A1 - Suzuki, Miho M A1 - Tassy, Olivier A1 - Takatori, Naohito A1 - Tokuoka, Miki A1 - Yagi, Kasumi A1 - Yoshizaki, Fumiko A1 - Wada, Shuichi A1 - Zhang, Cindy A1 - Hyatt, P Douglas A1 - Larimer, Frank A1 - Detter, Chris A1 - Doggett, Norman A1 - Glavina, Tijana A1 - Hawkins, Trevor A1 - Richardson, Paul A1 - Lucas, Susan A1 - Kohara, Yuji A1 - Levine, Michael A1 - Satoh, Nori A1 - Rokhsar, Daniel S KW - Alleles KW - Animals KW - Apoptosis KW - Base Sequence KW - Cellulose KW - Central Nervous System KW - Ciona intestinalis KW - Computational Biology KW - Endocrine System KW - Gene Dosage KW - Gene Duplication KW - genes KW - Genes, Homeobox KW - Genome KW - Heart KW - Immunity KW - Molecular Sequence Data KW - Multigene Family KW - Muscle Proteins KW - Organizers, Embryonic KW - Phylogeny KW - Polymorphism, Genetic KW - Proteins KW - Sequence Analysis, DNA KW - Sequence Homology, Nucleic Acid KW - Species Specificity KW - Thyroid Gland KW - Urochordata KW - Vertebrates AB -

The first chordates appear in the fossil record at the time of the Cambrian explosion, nearly 550 million years ago. The modern ascidian tadpole represents a plausible approximation to these ancestral chordates. To illuminate the origins of chordate and vertebrates, we generated a draft of the protein-coding portion of the genome of the most studied ascidian, Ciona intestinalis. The Ciona genome contains approximately 16,000 protein-coding genes, similar to the number in other invertebrates, but only half that found in vertebrates. Vertebrate gene families are typically found in simplified form in Ciona, suggesting that ascidians contain the basic ancestral complement of genes involved in cell signaling and development. The ascidian genome has also acquired a number of lineage-specific innovations, including a group of genes engaged in cellulose metabolism that are related to those in bacteria and fungi.

VL - 298 CP - 5601 M3 - 10.1126/science.1080049 ER - TY - JOUR T1 - A new, expressed multigene family containing a hot spot for insertion of retroelements is associated with polymorphic subtelomeric regions of Trypanosoma brucei. JF - Eukaryot Cell Y1 - 2002 A1 - Bringaud, Frederic A1 - Biteau, Nicolas A1 - Melville, Sara E A1 - Hez, Stéphanie A1 - El-Sayed, Najib M A1 - Leech, Vanessa A1 - Berriman, Matthew A1 - Hall, Neil A1 - Donelson, John E A1 - Baltz, Théo KW - Amino Acid Sequence KW - Animals KW - Base Sequence KW - Cloning, Molecular KW - DNA Primers KW - DNA, Protozoan KW - Escherichia coli KW - Genes, Protozoan KW - Molecular Sequence Data KW - Multigene Family KW - Mutagenesis, Insertional KW - Phylogeny KW - Polymorphism, Genetic KW - Protozoan Proteins KW - Pseudogenes KW - Retroelements KW - sequence alignment KW - Sequence Homology, Amino Acid KW - Telomere KW - Trypanosoma brucei brucei KW - Trypanosoma cruzi AB -

We describe a novel gene family that forms clusters in subtelomeric regions of Trypanosoma brucei chromosomes and partially accounts for the observed clustering of retrotransposons. The ingi and ribosomal inserted mobile element (RIME) non-LTR retrotransposons share 250 bp at both extremities and are the most abundant putatively mobile elements, with about 500 copies per haploid genome. From cDNA clones and subsequently in the T. brucei genomic DNA databases, we identified 52 homologous gene and pseudogene sequences, 16 of which contain a RIME and/or ingi retrotransposon inserted at exactly the same relative position. Here these genes are called the RHS family, for retrotransposon hot spot. Comparison of the protein sequences encoded by RHS genes (21 copies) and pseudogenes (24 copies) revealed a conserved central region containing an ATP/GTP-binding motif and the RIME/ingi insertion site. The RHS proteins share between 13 and 96% identity, and six subfamilies, RHS1 to RHS6, can be defined on the basis of their divergent C-terminal domains. Immunofluorescence and Western blot analyses using RHS subfamily-specific immune sera show that RHS proteins are constitutively expressed and occur mainly in the nucleus. Analysis of Genome Survey Sequence databases indicated that the Trypanosoma brucei diploid genome contains about 280 RHS (pseudo)genes. Among the 52 identified RHS (pseudo)genes, 48 copies are in three RHS clusters located in subtelomeric regions of chromosomes Ia and II and adjacent to the active bloodstream form expression site in T. brucei strain TREU927/4 GUTat10.1. RHS genes comprise the remaining sequence of the size-polymorphic "repetitive region" described for T. brucei chromosome I, and a homologous gene family is present in the Trypanosoma cruzi genome.

VL - 1 CP - 1 ER -