TY - JOUR T1 - Protein quantification across hundreds of experimental conditions. JF - Proc Natl Acad Sci U S A Y1 - 2009 A1 - Khan, Zia A1 - Bloom, Joshua S A1 - Garcia, Benjamin A A1 - Singh, Mona A1 - Kruglyak, Leonid KW - algorithms KW - Animals KW - Automatic Data Processing KW - Chromatography, Liquid KW - Databases, Factual KW - Fungal Proteins KW - HUMANS KW - Isotopes KW - Mice KW - Proteins KW - proteomics KW - Tandem Mass Spectrometry AB -

Quantitative studies of protein abundance rarely span more than a small number of experimental conditions and replicates. In contrast, quantitative studies of transcript abundance often span hundreds of experimental conditions and replicates. This situation exists, in part, because extracting quantitative data from large proteomics datasets is significantly more difficult than reading quantitative data from a gene expression microarray. To address this problem, we introduce two algorithmic advances in the processing of quantitative proteomics data. First, we use space-partitioning data structures to handle the large size of these datasets. Second, we introduce techniques that combine graph-theoretic algorithms with space-partitioning data structures to collect relative protein abundance data across hundreds of experimental conditions and replicates. We validate these algorithmic techniques by analyzing several datasets and computing both internal and external measures of quantification accuracy. We demonstrate the scalability of these techniques by applying them to a large dataset that comprises a total of 472 experimental conditions and replicates.

VL - 106 CP - 37 M3 - 10.1073/pnas.0904100106 ER - TY - JOUR T1 - The minimum information about a genome sequence (MIGS) specification JF - Nature biotechnologyNature biotechnology Y1 - 2008 A1 - Field, Dawn A1 - Garrity, George A1 - Gray, Tanya A1 - Morrison, Norman A1 - J. Selengut A1 - Sterk, Peter A1 - Tatusova, Tatiana A1 - Thomson, Nicholas A1 - Allen, Michael J. A1 - Angiuoli, Samuel V. A1 - Ashburner, Michael A1 - Axelrod, Nelson A1 - Baldauf, Sandra A1 - Ballard, Stuart A1 - Boore, Jeffrey A1 - Cochrane, Guy A1 - Cole, James A1 - Dawyndt, Peter A1 - De Vos, Paul A1 - DePamphilis, Claude A1 - Edwards, Robert A1 - Faruque, Nadeem A1 - Feldman, Robert A1 - Gilbert, Jack A1 - Gilna, Paul A1 - Glöckner, Frank Oliver A1 - Goldstein, Philip A1 - Guralnick, Robert A1 - Haft, Dan A1 - Hancock, David A1 - Hermjakob, Henning A1 - Hertz-Fowler, Christiane A1 - Hugenholtz, Phil A1 - Joint, Ian A1 - Kagan, Leonid A1 - Kane, Matthew A1 - Kennedy, Jessie A1 - Kowalchuk, George A1 - Kottmann, Renzo A1 - Kolker, Eugene A1 - Kravitz, Saul A1 - Kyrpides, Nikos A1 - Leebens-Mack, Jim A1 - Lewis, Suzanna E. A1 - Li, Kelvin A1 - Lister, Allyson L. A1 - Lord, Phillip A1 - Maltsev, Natalia A1 - Markowitz, Victor A1 - Martiny, Jennifer A1 - Methe, Barbara A1 - Mizrachi, Ilene A1 - Moxon, Richard A1 - Nelson, Karen A1 - Parkhill, Julian A1 - Proctor, Lita A1 - White, Owen A1 - Sansone, Susanna-Assunta A1 - Spiers, Andrew A1 - Stevens, Robert A1 - Swift, Paul A1 - Taylor, Chris A1 - Tateno, Yoshio A1 - Tett, Adrian A1 - Turner, Sarah A1 - Ussery, David A1 - Vaughan, Bob A1 - Ward, Naomi A1 - Whetzel, Trish A1 - San Gil, Ingio A1 - Wilson, Gareth A1 - Wipat, Anil KW - Chromosome mapping KW - Databases, Factual KW - information dissemination KW - Information Storage and Retrieval KW - Information Theory KW - Internationality AB - With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases. VL - 26 N1 - http://www.ncbi.nlm.nih.gov/pubmed/18464787?dopt=Abstract ER - TY - JOUR T1 - Splicing signals in Drosophila: intron size, information content, and consensus sequences. JF - Nucleic Acids Res Y1 - 1992 A1 - Mount, S M A1 - Burks, C A1 - Hertz, G A1 - Stormo, G D A1 - White, O A1 - Fields, C KW - Animals KW - Base Sequence KW - Consensus Sequence KW - Databases, Factual KW - Drosophila KW - Introns KW - Molecular Sequence Data KW - RNA Splicing KW - RNA, Messenger KW - software AB -

A database of 209 Drosophila introns was extracted from Genbank (release number 64.0) and examined by a number of methods in order to characterize features that might serve as signals for messenger RNA splicing. A tight distribution of sizes was observed: while the smallest introns in the database are 51 nucleotides, more than half are less than 80 nucleotides in length, and most of these have lengths in the range of 59-67 nucleotides. Drosophila splice sites found in large and small introns differ in only minor ways from each other and from those found in vertebrate introns. However, larger introns have greater pyrimidine-richness in the region between 11 and 21 nucleotides upstream of 3' splice sites. The Drosophila branchpoint consensus matrix resembles C T A A T (in which branch formation occurs at the underlined A), and differs from the corresponding mammalian signal in the absence of G at the position immediately preceding the branchpoint. The distribution of occurrences of this sequence suggests a minimum distance between 5' splice sites and branchpoints of about 38 nucleotides, and a minimum distance between 3' splice sites and branchpoints of 15 nucleotides. The methods we have used detect no information in exon sequences other than in the few nucleotides immediately adjacent to the splice sites. However, Drosophila resembles many other species in that there is a discontinuity in A + T content between exons and introns, which are A + T rich.

VL - 20 CP - 16 ER -