TY - Generic T1 - Epiviz: interactive visual analytics for functional genomics data. Y1 - 2014 A1 - Chelaru, Florin A1 - Smith, Llewellyn A1 - Goldstein, Naomi A1 - Bravo, Héctor Corrada KW - algorithms KW - Chromosome mapping KW - Data Mining KW - database management systems KW - Databases, Genetic KW - Genomics KW - Internet KW - software KW - User-Computer Interface AB -

Visualization is an integral aspect of genomics data analysis. Algorithmic-statistical analysis and interactive visualization are most effective when used iteratively. Epiviz (http://epiviz.cbcb.umd.edu/), a web-based genome browser, and the Epivizr Bioconductor package allow interactive, extensible and reproducible visualization within a state-of-the-art data-analysis platform.

JA - Nat Methods VL - 11 CP - 9 M3 - 10.1038/nmeth.3038 ER - TY - Generic T1 - Genomic analysis of sequence-dependent DNA curvature in Leishmania. Y1 - 2013 A1 - Smircich, Pablo A1 - Forteza, Diego A1 - El-Sayed, Najib M A1 - Garat, Beatriz KW - Chromosome mapping KW - Comparative Genomic Hybridization KW - Computational Biology KW - DNA, Protozoan KW - Genome, Protozoan KW - Genomics KW - HUMANS KW - Leishmania KW - Nucleic Acid Conformation AB -

Leishmania major is a flagellated protozoan parasite of medical importance. Like other members of the Trypanosomatidae family, it possesses unique mechanisms of gene expression such as constitutive polycistronic transcription of directional gene clusters, gene amplification, mRNA trans-splicing, and extensive editing of mitochondrial transcripts. The molecular signals underlying most of these processes remain under investigation. In order to investigate the role of DNA secondary structure signals in gene expression, we carried out a genome-wide in silico analysis of the intrinsic DNA curvature. The L. major genome revealed a lower frequency of high intrinsic curvature regions as well as inter- and intra- chromosomal distribution heterogeneity, when compared to prokaryotic and eukaryotic organisms. Using a novel method aimed at detecting region-integrated intrinsic curvature (RIIC), high DNA curvature was found to be associated with regions implicated in transcription initiation. Those include divergent strand-switch regions between directional gene clusters and regions linked to markers of active transcription initiation such as acetylated H3 histone, TRF4 and SNAP50. These findings suggest a role for DNA curvature in transcription initiation in Leishmania supporting the relevance of DNA secondary structures signals.

JA - PLoS One VL - 8 CP - 4 M3 - 10.1371/journal.pone.0063068 ER - TY - JOUR T1 - The minimum information about a genome sequence (MIGS) specification JF - Nature biotechnologyNature biotechnology Y1 - 2008 A1 - Field, Dawn A1 - Garrity, George A1 - Gray, Tanya A1 - Morrison, Norman A1 - J. Selengut A1 - Sterk, Peter A1 - Tatusova, Tatiana A1 - Thomson, Nicholas A1 - Allen, Michael J. A1 - Angiuoli, Samuel V. A1 - Ashburner, Michael A1 - Axelrod, Nelson A1 - Baldauf, Sandra A1 - Ballard, Stuart A1 - Boore, Jeffrey A1 - Cochrane, Guy A1 - Cole, James A1 - Dawyndt, Peter A1 - De Vos, Paul A1 - DePamphilis, Claude A1 - Edwards, Robert A1 - Faruque, Nadeem A1 - Feldman, Robert A1 - Gilbert, Jack A1 - Gilna, Paul A1 - Glöckner, Frank Oliver A1 - Goldstein, Philip A1 - Guralnick, Robert A1 - Haft, Dan A1 - Hancock, David A1 - Hermjakob, Henning A1 - Hertz-Fowler, Christiane A1 - Hugenholtz, Phil A1 - Joint, Ian A1 - Kagan, Leonid A1 - Kane, Matthew A1 - Kennedy, Jessie A1 - Kowalchuk, George A1 - Kottmann, Renzo A1 - Kolker, Eugene A1 - Kravitz, Saul A1 - Kyrpides, Nikos A1 - Leebens-Mack, Jim A1 - Lewis, Suzanna E. A1 - Li, Kelvin A1 - Lister, Allyson L. A1 - Lord, Phillip A1 - Maltsev, Natalia A1 - Markowitz, Victor A1 - Martiny, Jennifer A1 - Methe, Barbara A1 - Mizrachi, Ilene A1 - Moxon, Richard A1 - Nelson, Karen A1 - Parkhill, Julian A1 - Proctor, Lita A1 - White, Owen A1 - Sansone, Susanna-Assunta A1 - Spiers, Andrew A1 - Stevens, Robert A1 - Swift, Paul A1 - Taylor, Chris A1 - Tateno, Yoshio A1 - Tett, Adrian A1 - Turner, Sarah A1 - Ussery, David A1 - Vaughan, Bob A1 - Ward, Naomi A1 - Whetzel, Trish A1 - San Gil, Ingio A1 - Wilson, Gareth A1 - Wipat, Anil KW - Chromosome mapping KW - Databases, Factual KW - information dissemination KW - Information Storage and Retrieval KW - Information Theory KW - Internationality AB - With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases. VL - 26 N1 - http://www.ncbi.nlm.nih.gov/pubmed/18464787?dopt=Abstract ER - TY - JOUR T1 - Genome sequence and identification of candidate vaccine antigens from the animal pathogen Dichelobacter nodosus. JF - Nat Biotechnol Y1 - 2007 A1 - Myers, Garry S A A1 - Parker, Dane A1 - Al-Hasani, Keith A1 - Kennan, Ruth M A1 - Seemann, Torsten A1 - Ren, Qinghu A1 - Badger, Jonathan H A1 - Selengut, Jeremy D A1 - DeBoy, Robert T A1 - Tettelin, Hervé A1 - Boyce, John D A1 - McCarl, Victoria P A1 - Han, Xiaoyan A1 - Nelson, William C A1 - Madupu, Ramana A1 - Mohamoud, Yasmin A1 - Holley, Tara A1 - Fedorova, Nadia A1 - Khouri, Hoda A1 - Bottomley, Steven P A1 - Whittington, Richard J A1 - Adler, Ben A1 - Songer, J Glenn A1 - Rood, Julian I A1 - Paulsen, Ian T KW - Animals KW - Antigens KW - Chromosome mapping KW - Dichelobacter nodosus KW - Foot Rot KW - Genome, Bacterial KW - Sequence Analysis, DNA AB -

Dichelobacter nodosus causes ovine footrot, a disease that leads to severe economic losses in the wool and meat industries. We sequenced its 1.4-Mb genome, the smallest known genome of an anaerobe. It differs markedly from small genomes of intracellular bacteria, retaining greater biosynthetic capabilities and lacking any evidence of extensive ongoing genome reduction. Comparative genomic microarray studies and bioinformatic analysis suggested that, despite its small size, almost 20% of the genome is derived from lateral gene transfer. Most of these regions seem to be associated with virulence. Metabolic reconstruction indicated unsuspected capabilities, including carbohydrate utilization, electron transfer and several aerobic pathways. Global transcriptional profiling and bioinformatic analysis enabled the prediction of virulence factors and cell surface proteins. Screening of these proteins against ovine antisera identified eight immunogenic proteins that are candidate antigens for a cross-protective vaccine.

VL - 25 CP - 5 M3 - 10.1038/nbt1302 ER - TY - JOUR T1 - Genome sequence and identification of candidate vaccine antigens from the animal pathogen Dichelobacter nodosus JF - Nature biotechnologyNature biotechnology Y1 - 2007 A1 - Myers, Garry S. A. A1 - Parker, Dane A1 - Al-Hasani, Keith A1 - Kennan, Ruth M. A1 - Seemann, Torsten A1 - Ren, Qinghu A1 - Badger, Jonathan H. A1 - J. Selengut A1 - DeBoy, Robert T. A1 - Tettelin, Hervé A1 - Boyce, John D. A1 - McCarl, Victoria P. A1 - Han, Xiaoyan A1 - Nelson, William C. A1 - Madupu, Ramana A1 - Mohamoud, Yasmin A1 - Holley, Tara A1 - Fedorova, Nadia A1 - Khouri, Hoda A1 - Bottomley, Steven P. A1 - Whittington, Richard J. A1 - Adler, Ben A1 - Songer, J. Glenn A1 - Rood, Julian I. A1 - Paulsen, Ian T. KW - Animals KW - Antigens KW - Chromosome mapping KW - Dichelobacter nodosus KW - Foot Rot KW - Genome, Bacterial KW - Sequence Analysis, DNA AB - Dichelobacter nodosus causes ovine footrot, a disease that leads to severe economic losses in the wool and meat industries. We sequenced its 1.4-Mb genome, the smallest known genome of an anaerobe. It differs markedly from small genomes of intracellular bacteria, retaining greater biosynthetic capabilities and lacking any evidence of extensive ongoing genome reduction. Comparative genomic microarray studies and bioinformatic analysis suggested that, despite its small size, almost 20% of the genome is derived from lateral gene transfer. Most of these regions seem to be associated with virulence. Metabolic reconstruction indicated unsuspected capabilities, including carbohydrate utilization, electron transfer and several aerobic pathways. Global transcriptional profiling and bioinformatic analysis enabled the prediction of virulence factors and cell surface proteins. Screening of these proteins against ovine antisera identified eight immunogenic proteins that are candidate antigens for a cross-protective vaccine. VL - 25 N1 - http://www.ncbi.nlm.nih.gov/pubmed/17468768?dopt=Abstract ER - TY - JOUR T1 - SplicePort--an interactive splice-site analysis tool. JF - Nucleic Acids Res Y1 - 2007 A1 - Dogan, Rezarta Islamaj A1 - Getoor, Lise A1 - Wilbur, W John A1 - Mount, Stephen M KW - Base Sequence KW - Chromosome mapping KW - Computational Biology KW - Computer simulation KW - DNA KW - Genome KW - HUMANS KW - Internet KW - Models, Genetic KW - Molecular Sequence Data KW - Pattern Recognition, Automated KW - RNA Splice Sites KW - sequence alignment KW - Sequence Analysis, DNA KW - User-Computer Interface AB -

SplicePort is a web-based tool for splice-site analysis that allows the user to make splice-site predictions for submitted sequences. In addition, the user can also browse the rich catalog of features that underlies these predictions, and which we have found capable of providing high classification accuracy on human splice sites. Feature selection is optimized for human splice sites, but the selected features are likely to be predictive for other mammals as well. With our interactive feature browsing and visualization tool, the user can view and explore subsets of features used in splice-site prediction (either the features that account for the classification of a specific input sequence or the complete collection of features). Selected feature sets can be searched, ranked or displayed easily. The user can group features into clusters and frequency plot WebLogos can be generated for each cluster. The user can browse the identified clusters and their contributing elements, looking for new interesting signals, or can validate previously observed signals. The SplicePort web server can be accessed at http://www.cs.umd.edu/projects/SplicePort and http://www.spliceport.org.

VL - 35 CP - Web Server issue M3 - 10.1093/nar/gkm407 ER - TY - JOUR T1 - Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics JF - Bioinformatics (Oxford, England)Bioinformatics (Oxford, England) Y1 - 2005 A1 - Haft, Daniel H. A1 - J. Selengut A1 - Brinkac, Lauren M. A1 - Zafar, Nikhat A1 - White, Owen KW - Chromosome mapping KW - database management systems KW - Databases, Genetic KW - documentation KW - Gene Expression Profiling KW - Gene Expression Regulation KW - Genomics KW - Information Storage and Retrieval KW - Microbiological Techniques KW - natural language processing KW - Prokaryotic Cells KW - Proteome KW - signal transduction KW - software KW - User-Computer Interface KW - Vocabulary, Controlled AB - MOTIVATION: The presence or absence of metabolic pathways and structures provide a context that makes protein annotation far more reliable. Compiling such information across microbial genomes improves the functional classification of proteins and provides a valuable resource for comparative genomics. RESULTS: We have created a Genome Properties system to present key aspects of prokaryotic biology using standardized computational methods and controlled vocabularies. Properties reflect gene content, phenotype, phylogeny and computational analyses. The results of searches using hidden Markov models allow many properties to be deduced automatically, especially for families of proteins (equivalogs) conserved in function since their last common ancestor. Additional properties are derived from curation, published reports and other forms of evidence. Genome Properties system was applied to 156 complete prokaryotic genomes, and is easily mined to find differences between species, correlations between metabolic features and families of uncharacterized proteins, or relationships among properties. AVAILABILITY: Genome Properties can be found at http://www.tigr.org/Genome_Properties SUPPLEMENTARY INFORMATION: http://www.tigr.org/tigr-scripts/CMR2/genome_properties_references.spl. VL - 21 N1 - http://www.ncbi.nlm.nih.gov/pubmed/15347579?dopt=Abstract ER - TY - JOUR T1 - Schistosoma mansoni genome project: an update JF - Parasitology InternationalParasitology International Y1 - 2004 A1 - LoVerde, Philip T. A1 - Hirai, Hirohisa A1 - Merrick, Joseph M. A1 - Lee, Norman H. A1 - Najib M. El‐Sayed KW - Chromosome mapping KW - Gene discovery KW - Genomics KW - Schistosoma mansoni AB - A schistosome genome project was initiated by the World Health Organization in 1994 with the notion that the best prospects for identifying new targets for drugs, vaccines, and diagnostic development lie in schistosome gene discovery, development of chromosome maps, whole genome sequencing and genome analysis. Schistosoma mansoni has a haploid genome of 270 Mb contained on 8 pairs of chromosomes. It is estimated that the S. mansoni genome contains between 15 000 and 25 000 genes. There are approximately 16 689 ESTs obtained from diverse libraries representing different developmental stages of S. mansoni, deposited in the NCBI EST database. More than half of the deposited sequences correspond to genes of unknown function. Approximately 40-50% of the sequences form unique clusters, suggesting that approximately 20-25% of the total schistosome genes have been discovered. Efforts to develop low resolution chromosome maps are in progress. There is a genome sequencing program underway that will provide 3X sequence coverage of the S. mansoni genome that will result in approximately 95% gene discovery. The genomics era has provided the resources to usher in the era of functional genomics that will involve microarrays to focus on specific metabolic pathways, proteomics to identify relevant proteins and protein-protein interactions to understand critical parasite pathways. Functional genomics is expected to accelerate the development of control and treatment strategies for schistosomiasis. VL - 53 SN - 1383-5769 ER - TY - JOUR T1 - The sequence and analysis of Trypanosoma brucei chromosome II. JF - Nucleic Acids Res Y1 - 2003 A1 - el-Sayed, Najib M A A1 - Ghedin, Elodie A1 - Song, Jinming A1 - MacLeod, Annette A1 - Bringaud, Frederic A1 - Larkin, Christopher A1 - Wanless, David A1 - Peterson, Jeremy A1 - Hou, Lihua A1 - Taylor, Sonya A1 - Tweedie, Alison A1 - Biteau, Nicolas A1 - Khalak, Hanif G A1 - Lin, Xiaoying A1 - Mason, Tanya A1 - Hannick, Linda A1 - Caler, Elisabet A1 - Blandin, Gaëlle A1 - Bartholomeu, Daniella A1 - Simpson, Anjana J A1 - Kaul, Samir A1 - Zhao, Hong A1 - Pai, Grace A1 - Van Aken, Susan A1 - Utterback, Teresa A1 - Haas, Brian A1 - Koo, Hean L A1 - Umayam, Lowell A1 - Suh, Bernard A1 - Gerrard, Caroline A1 - Leech, Vanessa A1 - Qi, Rong A1 - Zhou, Shiguo A1 - Schwartz, David A1 - Feldblyum, Tamara A1 - Salzberg, Steven A1 - Tait, Andrew A1 - Turner, C Michael R A1 - Ullu, Elisabetta A1 - White, Owen A1 - Melville, Sara A1 - Adams, Mark D A1 - Fraser, Claire M A1 - Donelson, John E KW - Animals KW - Antigens, Protozoan KW - Chromosome mapping KW - Chromosomes KW - DNA, Protozoan KW - Gene Duplication KW - Genes, Protozoan KW - Molecular Sequence Data KW - Pseudogenes KW - Recombination, Genetic KW - Sequence Analysis, DNA KW - Trypanosoma brucei brucei AB -

We report here the sequence of chromosome II from Trypanosoma brucei, the causative agent of African sleeping sickness. The 1.2-Mb pairs encode about 470 predicted genes organised in 17 directional clusters on either strand, the largest cluster of which has 92 genes lined up over a 284-kb region. An analysis of the GC skew reveals strand compositional asymmetries that coincide with the distribution of protein-coding genes, suggesting these asymmetries may be the result of transcription-coupled repair on coding versus non-coding strand. A 5-cM genetic map of the chromosome reveals recombinational 'hot' and 'cold' regions, the latter of which is predicted to include the putative centromere. One end of the chromosome consists of a 250-kb region almost exclusively composed of RHS (pseudo)genes that belong to a newly characterised multigene family containing a hot spot of insertion for retroelements. Interspersed with the RHS genes are a few copies of truncated RNA polymerase pseudogenes as well as expression site associated (pseudo)genes (ESAGs) 3 and 4, and 76 bp repeats. These features are reminiscent of a vestigial variant surface glycoprotein (VSG) gene expression site. The other end of the chromosome contains a 30-kb array of VSG genes, the majority of which are pseudogenes, suggesting that this region may be a site for modular de novo construction of VSG gene diversity during transposition/gene conversion events.

VL - 31 CP - 16 ER -