TY - JOUR T1 - Genome-wide analysis reveals novel genes essential for heme homeostasis in Caenorhabditis elegans. JF - PLoS Genet Y1 - 2010 A1 - Severance, Scott A1 - Rajagopal, Abbhirami A1 - Rao, Anita U A1 - Cerqueira, Gustavo C A1 - Mitreva, Makedonka A1 - El-Sayed, Najib M A1 - Krause, Michael A1 - Hamza, Iqbal KW - Animals KW - Caenorhabditis elegans KW - Dose-Response Relationship, Drug KW - Gene Expression Profiling KW - Gene Expression Regulation KW - genes KW - Genome-Wide Association Study KW - Heme KW - Homeostasis KW - HUMANS KW - Leishmania KW - Nematoda KW - Trypanosoma AB -

Heme is a cofactor in proteins that function in almost all sub-cellular compartments and in many diverse biological processes. Heme is produced by a conserved biosynthetic pathway that is highly regulated to prevent the accumulation of heme--a cytotoxic, hydrophobic tetrapyrrole. Caenorhabditis elegans and related parasitic nematodes do not synthesize heme, but instead require environmental heme to grow and develop. Heme homeostasis in these auxotrophs is, therefore, regulated in accordance with available dietary heme. We have capitalized on this auxotrophy in C. elegans to study gene expression changes associated with precisely controlled dietary heme concentrations. RNA was isolated from cultures containing 4, 20, or 500 microM heme; derived cDNA probes were hybridized to Affymetrix C. elegans expression arrays. We identified 288 heme-responsive genes (hrgs) that were differentially expressed under these conditions. Of these genes, 42% had putative homologs in humans, while genomes of medically relevant heme auxotrophs revealed homologs for 12% in both Trypanosoma and Leishmania and 24% in parasitic nematodes. Depletion of each of the 288 hrgs by RNA-mediated interference (RNAi) in a transgenic heme-sensor worm strain identified six genes that regulated heme homeostasis. In addition, seven membrane-spanning transporters involved in heme uptake were identified by RNAi knockdown studies using a toxic heme analog. Comparison of genes that were positive in both of the RNAi screens resulted in the identification of three genes in common that were vital for organismal heme homeostasis in C. elegans. Collectively, our results provide a catalog of genes that are essential for metazoan heme homeostasis and demonstrate the power of C. elegans as a genetic animal model to dissect the regulatory circuits which mediate heme trafficking in both vertebrate hosts and their parasites, which depend on environmental heme for survival.

VL - 6 CP - 7 M3 - 10.1371/journal.pgen.1001044 ER - TY - JOUR T1 - Genome assortment, not serogroup, defines Vibrio cholerae pandemic strains JF - NatureNature Y1 - 2009 A1 - Brettin, Thomas S. A1 - Bruce, David C. A1 - Challacombe, Jean F. A1 - Detter, John C. A1 - Han, Cliff S. A1 - Munik, A. C. A1 - Chertkov, Olga A1 - Meincke, Linda A1 - Saunders, Elizabeth A1 - Choi, Seon Y. A1 - Haley, Bradd J. A1 - Taviani, Elisa A1 - Jeon, Yoon-Seong A1 - Kim, Dong Wook A1 - Lee, Jae-Hak A1 - Walters, Ronald A. A1 - Hug, Anwar A1 - Rita R. Colwell KW - 59 KW - CHOLERA KW - genes KW - Genetics KW - GENOTYPE KW - ISLANDS KW - ORIGIN KW - PHENOTYPE KW - PUBLIC HEALTH KW - recombination KW - STRAINS KW - Toxins AB - Vibrio cholerae, the causative agent of cholera, is a bacterium autochthonous to the aquatic environment, and a serious public health threat. V. cholerae serogroup O1 is responsible for the previous two cholera pandemics, in which classical and El Tor biotypes were dominant in the 6th and the current 7th pandemics, respectively. Cholera researchers continually face newly emerging and re-emerging pathogenic clones carrying combinations of new serogroups as well as of phenotypic and genotypic properties. These genotype and phenotype changes have hampered control of the disease. Here we compare the complete genome sequences of 23 strains of V. cholerae isolated from a variety of sources and geographical locations over the past 98 years in an effort to elucidate the evolutionary mechanisms governing genetic diversity and genesis of new pathogenic clones. The genome-based phylogeny revealed 12 distinct V. cholerae phyletic lineages, of which one, designated the V. cholerae core genome (CG), comprises both O1 classical and EI Tor biotypes. All 7th pandemic clones share nearly identical gene content, i.e., the same genome backbone. The transition from 6th to 7th pandemic strains is defined here as a 'shift' between pathogenic clones belonging to the same O1 serogroup, but from significantly different phyletic lineages within the CG clade. In contrast, transition among clones during the present 7th pandemic period can be characterized as a 'drift' between clones, differentiated mainly by varying composition of laterally transferred genomic islands, resulting in emergence of variants, exemplified by V.cholerae serogroup O139 and V.cholerae O1 El Tor hybrid clones that produce cholera toxin of classical biotype. Based on the comprehensive comparative genomics presented in this study it is concluded that V. cholerae undergoes extensive genetic recombination via lateral gene transfer, and, therefore, genome assortment, not serogroup, should be used to define pathogenic V. cholerae clones. ER - TY - Generic T1 - Inexact Local Alignment Search over Suffix Arrays T2 - IEEE International Conference on Bioinformatics and Biomedicine, 2009. BIBM '09 Y1 - 2009 A1 - Ghodsi, M. A1 - M. Pop KW - bacteria KW - Bioinformatics KW - biology computing KW - Computational Biology KW - Costs KW - DNA KW - DNA homology searches KW - DNA sequences KW - Educational institutions KW - generalized heuristic KW - genes KW - Genetics KW - genome alignment KW - Genomics KW - human KW - inexact local alignment search KW - inexact seeds KW - local alignment KW - local alignment tools KW - memory efficient suffix array KW - microorganisms KW - molecular biophysics KW - mouse KW - Organisms KW - Sensitivity and Specificity KW - sequences KW - suffix array KW - USA Councils AB - We describe an algorithm for finding approximate seeds for DNA homology searches. In contrast to previous algorithms that use exact or spaced seeds, our approximate seeds may contain insertions and deletions. We present a generalized heuristic for finding such seeds efficiently and prove that the heuristic does not affect sensitivity. We show how to adapt this algorithm to work over the memory efficient suffix array with provably minimal overhead in running time. We demonstrate the effectiveness of our algorithm on two tasks: whole genome alignment of bacteria and alignment of the DNA sequences of 177 genes that are orthologous in human and mouse. We show our algorithm achieves better sensitivity and uses less memory than other commonly used local alignment tools. JA - IEEE International Conference on Bioinformatics and Biomedicine, 2009. BIBM '09 PB - IEEE SN - 978-0-7695-3885-3 ER - TY - Generic T1 - Dynamic querying for pattern identification in microarray and genomic data T2 - 2003 International Conference on Multimedia and Expo, 2003. ICME '03. Proceedings Y1 - 2003 A1 - Hochheiser, H. A1 - Baehrecke, E. H. A1 - Stephen M. Mount A1 - Shneiderman, Ben KW - Bioinformatics KW - data sets KW - Displays KW - dynamic querying KW - expression profiles KW - Frequency KW - Gene expression KW - genes KW - Genetics KW - genomic data KW - Genomics KW - linear ordered sequences KW - macromolecules KW - medical signal processing KW - Mice KW - Microarray KW - pattern identification KW - pattern recognition KW - premRNA splicing KW - Query processing KW - sequences KW - Signal processing KW - splicing KW - TimeSearcher AB - Data sets involving linear ordered sequences are a recurring theme in bioinformatics. Dynamic query tools that support exploration of these data sets can be useful for identifying patterns of interest. This paper describes the use of one such tool - timesearcher - to interactively explore linear sequence data sets taken from two bioinformatics problems. Microarray time course data sets involve expression levels for large numbers of genes over multiple time points. Timesearcher can be used to interactively search these data sets for genes with expression profiles of interest. The occurrence frequencies of short sequences of DNA in aligned exons can be used to identify sequences that play a role in the pre-mRNA splicing. Timesearcher can be used to search these data sets for candidate splicing signals. JA - 2003 International Conference on Multimedia and Expo, 2003. ICME '03. Proceedings PB - IEEE VL - 3 SN - 0-7803-7965-9 ER - TY - JOUR T1 - The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. JF - Science Y1 - 2002 A1 - Dehal, Paramvir A1 - Satou, Yutaka A1 - Campbell, Robert K A1 - Chapman, Jarrod A1 - Degnan, Bernard A1 - De Tomaso, Anthony A1 - Davidson, Brad A1 - Di Gregorio, Anna A1 - Gelpke, Maarten A1 - Goodstein, David M A1 - Harafuji, Naoe A1 - Hastings, Kenneth E M A1 - Ho, Isaac A1 - Hotta, Kohji A1 - Huang, Wayne A1 - Kawashima, Takeshi A1 - Lemaire, Patrick A1 - Martinez, Diego A1 - Meinertzhagen, Ian A A1 - Necula, Simona A1 - Nonaka, Masaru A1 - Putnam, Nik A1 - Rash, Sam A1 - Saiga, Hidetoshi A1 - Satake, Masanobu A1 - Terry, Astrid A1 - Yamada, Lixy A1 - Wang, Hong-Gang A1 - Awazu, Satoko A1 - Azumi, Kaoru A1 - Boore, Jeffrey A1 - Branno, Margherita A1 - Chin-Bow, Stephen A1 - DeSantis, Rosaria A1 - Doyle, Sharon A1 - Francino, Pilar A1 - Keys, David N A1 - Haga, Shinobu A1 - Hayashi, Hiroko A1 - Hino, Kyosuke A1 - Imai, Kaoru S A1 - Inaba, Kazuo A1 - Kano, Shungo A1 - Kobayashi, Kenji A1 - Kobayashi, Mari A1 - Lee, Byung-In A1 - Makabe, Kazuhiro W A1 - Manohar, Chitra A1 - Matassi, Giorgio A1 - Medina, Monica A1 - Mochizuki, Yasuaki A1 - Mount, Steve A1 - Morishita, Tomomi A1 - Miura, Sachiko A1 - Nakayama, Akie A1 - Nishizaka, Satoko A1 - Nomoto, Hisayo A1 - Ohta, Fumiko A1 - Oishi, Kazuko A1 - Rigoutsos, Isidore A1 - Sano, Masako A1 - Sasaki, Akane A1 - Sasakura, Yasunori A1 - Shoguchi, Eiichi A1 - Shin-i, Tadasu A1 - Spagnuolo, Antoinetta A1 - Stainier, Didier A1 - Suzuki, Miho M A1 - Tassy, Olivier A1 - Takatori, Naohito A1 - Tokuoka, Miki A1 - Yagi, Kasumi A1 - Yoshizaki, Fumiko A1 - Wada, Shuichi A1 - Zhang, Cindy A1 - Hyatt, P Douglas A1 - Larimer, Frank A1 - Detter, Chris A1 - Doggett, Norman A1 - Glavina, Tijana A1 - Hawkins, Trevor A1 - Richardson, Paul A1 - Lucas, Susan A1 - Kohara, Yuji A1 - Levine, Michael A1 - Satoh, Nori A1 - Rokhsar, Daniel S KW - Alleles KW - Animals KW - Apoptosis KW - Base Sequence KW - Cellulose KW - Central Nervous System KW - Ciona intestinalis KW - Computational Biology KW - Endocrine System KW - Gene Dosage KW - Gene Duplication KW - genes KW - Genes, Homeobox KW - Genome KW - Heart KW - Immunity KW - Molecular Sequence Data KW - Multigene Family KW - Muscle Proteins KW - Organizers, Embryonic KW - Phylogeny KW - Polymorphism, Genetic KW - Proteins KW - Sequence Analysis, DNA KW - Sequence Homology, Nucleic Acid KW - Species Specificity KW - Thyroid Gland KW - Urochordata KW - Vertebrates AB -

The first chordates appear in the fossil record at the time of the Cambrian explosion, nearly 550 million years ago. The modern ascidian tadpole represents a plausible approximation to these ancestral chordates. To illuminate the origins of chordate and vertebrates, we generated a draft of the protein-coding portion of the genome of the most studied ascidian, Ciona intestinalis. The Ciona genome contains approximately 16,000 protein-coding genes, similar to the number in other invertebrates, but only half that found in vertebrates. Vertebrate gene families are typically found in simplified form in Ciona, suggesting that ascidians contain the basic ancestral complement of genes involved in cell signaling and development. The ascidian genome has also acquired a number of lineage-specific innovations, including a group of genes engaged in cellulose metabolism that are related to those in bacteria and fungi.

VL - 298 CP - 5601 M3 - 10.1126/science.1080049 ER - TY - JOUR T1 - Genomic sequence, splicing, and gene annotation. JF - Am J Hum Genet Y1 - 2000 A1 - Mount, S M KW - Animals KW - Consensus Sequence KW - Exons KW - genes KW - Genome KW - Genomics KW - HUMANS KW - Nucleotides KW - Regulatory Sequences, Nucleic Acid KW - RNA Splice Sites KW - RNA Splicing KW - Untranslated Regions VL - 67 CP - 4 M3 - 10.1086/303098 ER - TY - JOUR T1 - Drosophila melanogaster genes for U1 snRNA variants and their expression during development. JF - Nucleic Acids Res Y1 - 1990 A1 - Lo, P C A1 - Mount, S M KW - Animals KW - Base Sequence KW - Blotting, Southern KW - Cloning, Molecular KW - Drosophila melanogaster KW - Gene Expression Regulation KW - genes KW - Genetic Variation KW - Molecular Sequence Data KW - Nucleic Acid Conformation KW - Pseudogenes KW - Restriction Mapping KW - RNA, Small Nuclear AB -

We have cloned and characterized a complete set of seven U1-related sequences from Drosophila melanogaster. These sequences are located at the three cytogenetic loci 21D, 82E, and 95C. Three of these sequences have been previously studied: one U1 gene at 21D which encodes the prototype U1 sequence (U1a), one U1 gene at 82E which encodes a U1 variant with a single nucleotide substitution (U1b), and a pseudogene at 82E. The four previously uncharacterized genes are another U1b gene at 82E, two additional U1a genes at 95C, and a U1 gene at 95C which encodes a new variant (U1c) with a distinct single nucleotide change relative to U1a. Three blocks of 5' flanking sequence similarity are common to all six full length genes. Using specific primer extension assays, we have observed that the U1b RNA is expressed in Drosophila Kc cells and is associated with snRNP proteins, suggesting that the U1b-containing snRNP particles are able to participate in the process of pre-mRNA splicing. We have also examined the expression throughout Drosophila development of the two U1 variants relative to the prototype sequence. The U1c variant is undetectable by our methods, while the U1b variant exhibits a primarily embryonic pattern reminiscent of the expression of certain U1 variants in sea urchin, Xenopus, and mouse.

VL - 18 CP - 23 ER - TY - JOUR T1 - Sequence of a cDNA from the Drosophila melanogaster white gene. JF - Nucleic Acids Res Y1 - 1990 A1 - Pepling, M A1 - Mount, S M KW - Amino Acid Sequence KW - Animals KW - Base Sequence KW - DNA KW - Drosophila melanogaster KW - Eye Color KW - genes KW - Molecular Sequence Data VL - 18 CP - 6 ER - TY - JOUR T1 - Structure and expression of the Drosophila melanogaster gene for the U1 small nuclear ribonucleoprotein particle 70K protein. JF - Mol Cell Biol Y1 - 1990 A1 - Mancebo, R A1 - Lo, P C A1 - Mount, S M KW - Amino Acid Sequence KW - Animals KW - Base Sequence KW - Blotting, Northern KW - Blotting, Southern KW - Cloning, Molecular KW - DNA KW - Drosophila melanogaster KW - Gene expression KW - Gene Library KW - genes KW - HUMANS KW - Molecular Sequence Data KW - Molecular Weight KW - Oligonucleotide Probes KW - Poly A KW - Ribonucleoproteins KW - Ribonucleoproteins, Small Nuclear KW - RNA KW - RNA, Messenger KW - Sequence Homology, Nucleic Acid KW - Xenopus AB -

A genomic clone encoding the Drosophila U1 small nuclear ribonucleoprotein particle 70K protein was isolated by hybridization with a human U1 small nuclear ribonucleoprotein particle 70K protein cDNA. Southern blot and in situ hybridizations showed that this U1 70K gene is unique in the Drosophila genome, residing at cytological position 27D1,2. Polyadenylated transcripts of 1.9 and 3.1 kilobases were observed. While the 1.9-kilobase mRNA is always more abundant, the ratio of these two transcripts is developmentally regulated. Analysis of cDNA and genomic sequences indicated that these two RNAs encode an identical protein with a predicted molecular weight of 52,879. Comparison of the U1 70K proteins predicted from Drosophila, human, and Xenopus cDNAs revealed 68% amino acid identity in the most amino-terminal 214 amino acids, which include a sequence motif common to many proteins which bind RNA. The carboxy-terminal half is less well conserved but is highly charged and contains distinctive arginine-rich regions in all three species. These arginine-rich regions contain stretches of arginine-serine dipeptides like those found in transformer, transformer-2, and suppressor-of-white-apricot proteins, all of which have been identified as regulators of mRNA splicing in Drosophila melanogaster.

VL - 10 CP - 6 ER - TY - JOUR T1 - Sequence similarity. JF - Nature Y1 - 1987 A1 - Mount, S M KW - Adenosine Triphosphate KW - Amino Acid Sequence KW - Animals KW - Bacterial Proteins KW - Biological Transport, Active KW - Carrier Proteins KW - Drosophila melanogaster KW - genes KW - HUMANS KW - Pigments, Biological KW - Sequence Homology, Nucleic Acid VL - 325 CP - 6104 M3 - 10.1038/325487c0 ER - TY - JOUR T1 - Pseudogenes for human small nuclear RNA U3 appear to arise by integration of self-primed reverse transcripts of the RNA into new chromosomal sites. JF - Cell Y1 - 1983 A1 - Bernstein, L B A1 - Mount, S M A1 - Weiner, A M KW - Animals KW - Base Sequence KW - DNA KW - genes KW - HUMANS KW - Nucleic Acid Conformation KW - Rats KW - Recombination, Genetic KW - Repetitive Sequences, Nucleic Acid KW - RNA KW - RNA, Small Nuclear KW - RNA-Directed DNA Polymerase KW - Templates, Genetic KW - Transcription, Genetic AB -

We find that both human and rat U3 snRNA can function as self-priming templates for AMV reverse transcriptase in vitro. The 74 base cDNA is primed by the 3' end of intact U3 snRNA, and spans the characteristically truncated 69 or 70 base U3 sequence found in four different human U3 pseudogenes. The ability of human and rat U3 snRNA to self-prime is consistent with a U3 secondary structure model derived by a comparison between rat U3 snRNA and the homologous D2 snRNA from Dictyostelium discoideum. We propose that U3 pseudogenes are generated in vivo by integration of a self-primed cDNA copy of U3 snRNA at new chromosomal sites. We also consider the possibility that the same cDNA mediates gene conversion at the 5' end of bona fide U3 genes where, over the entire region spanned by the U3 cDNA, the two rat U3 sequence variants U3A and U3B are identical.

VL - 32 CP - 2 ER - TY - JOUR T1 - A catalogue of splice junction sequences. JF - Nucleic Acids Res Y1 - 1982 A1 - Mount, S M KW - Animals KW - Base Sequence KW - genes KW - Genes, Viral KW - HUMANS KW - Repetitive Sequences, Nucleic Acid KW - RNA Splicing KW - Species Specificity AB -

Splice junction sequences from a large number of nuclear and viral genes encoding protein have been collected. The sequence CAAG/GTAGAGT was found to be a consensus of 139 exon-intron boundaries (or donor sequences) and (TC)nNCTAG/G was found to be a consensus of 130 intron-exon boundaries (or acceptor sequences). The possible role of splice junction sequences as signals for processing is discussed.

VL - 10 CP - 2 ER -