TY - JOUR T1 - ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological process. JF - BMC Bioinformatics Y1 - 2011 A1 - Basu, Malay K A1 - Selengut, Jeremy D A1 - Haft, Daniel H KW - algorithms KW - Archaea KW - Archaeal Proteins KW - DNA KW - Methane KW - Phylogeny KW - software AB -

BACKGROUND: Phylogenetic profiling is a technique of scoring co-occurrence between a protein family and some other trait, usually another protein family, across a set of taxonomic groups. In spite of several refinements in recent years, the technique still invites significant improvement. To be its most effective, a phylogenetic profiling algorithm must be able to examine co-occurrences among protein families whose boundaries are uncertain within large homologous protein superfamilies.

RESULTS: Partial Phylogenetic Profiling (PPP) is an iterative algorithm that scores a given taxonomic profile against the taxonomic distribution of families for all proteins in a genome. The method works through optimizing the boundary of each protein family, rather than by relying on prebuilt protein families or fixed sequence similarity thresholds. Double Partial Phylogenetic Profiling (DPPP) is a related procedure that begins with a single sequence and searches for optimal granularities for its surrounding protein family in order to generate the best query profiles for PPP. We present ProPhylo, a high-performance software package for phylogenetic profiling studies through creating individually optimized protein family boundaries. ProPhylo provides precomputed databases for immediate use and tools for manipulating the taxonomic profiles used as queries.

CONCLUSION: ProPhylo results show universal markers of methanogenesis, a new DNA phosphorothioation-dependent restriction enzyme, and efficacy in guiding protein family construction. The software and the associated databases are freely available under the open source Perl Artistic License from ftp://ftp.jcvi.org/pub/data/ppp/.

VL - 12 M3 - 10.1186/1471-2105-12-434 ER - TY - JOUR T1 - ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological process JF - BMC bioinformaticsBMC Bioinformatics Y1 - 2011 A1 - Basu, Malay K. A1 - J. Selengut A1 - Haft, Daniel H. KW - algorithms KW - Archaea KW - Archaeal Proteins KW - DNA KW - Methane KW - Phylogeny KW - software AB - BACKGROUND: Phylogenetic profiling is a technique of scoring co-occurrence between a protein family and some other trait, usually another protein family, across a set of taxonomic groups. In spite of several refinements in recent years, the technique still invites significant improvement. To be its most effective, a phylogenetic profiling algorithm must be able to examine co-occurrences among protein families whose boundaries are uncertain within large homologous protein superfamilies. RESULTS: Partial Phylogenetic Profiling (PPP) is an iterative algorithm that scores a given taxonomic profile against the taxonomic distribution of families for all proteins in a genome. The method works through optimizing the boundary of each protein family, rather than by relying on prebuilt protein families or fixed sequence similarity thresholds. Double Partial Phylogenetic Profiling (DPPP) is a related procedure that begins with a single sequence and searches for optimal granularities for its surrounding protein family in order to generate the best query profiles for PPP. We present ProPhylo, a high-performance software package for phylogenetic profiling studies through creating individually optimized protein family boundaries. ProPhylo provides precomputed databases for immediate use and tools for manipulating the taxonomic profiles used as queries. CONCLUSION: ProPhylo results show universal markers of methanogenesis, a new DNA phosphorothioation-dependent restriction enzyme, and efficacy in guiding protein family construction. The software and the associated databases are freely available under the open source Perl Artistic License from ftp://ftp.jcvi.org/pub/data/ppp/. VL - 12 N1 - http://www.ncbi.nlm.nih.gov/pubmed/22070167?dopt=Abstract ER - TY - Generic T1 - Inexact Local Alignment Search over Suffix Arrays T2 - IEEE International Conference on Bioinformatics and Biomedicine, 2009. BIBM '09 Y1 - 2009 A1 - Ghodsi, M. A1 - M. Pop KW - bacteria KW - Bioinformatics KW - biology computing KW - Computational Biology KW - Costs KW - DNA KW - DNA homology searches KW - DNA sequences KW - Educational institutions KW - generalized heuristic KW - genes KW - Genetics KW - genome alignment KW - Genomics KW - human KW - inexact local alignment search KW - inexact seeds KW - local alignment KW - local alignment tools KW - memory efficient suffix array KW - microorganisms KW - molecular biophysics KW - mouse KW - Organisms KW - Sensitivity and Specificity KW - sequences KW - suffix array KW - USA Councils AB - We describe an algorithm for finding approximate seeds for DNA homology searches. In contrast to previous algorithms that use exact or spaced seeds, our approximate seeds may contain insertions and deletions. We present a generalized heuristic for finding such seeds efficiently and prove that the heuristic does not affect sensitivity. We show how to adapt this algorithm to work over the memory efficient suffix array with provably minimal overhead in running time. We demonstrate the effectiveness of our algorithm on two tasks: whole genome alignment of bacteria and alignment of the DNA sequences of 177 genes that are orthologous in human and mouse. We show our algorithm achieves better sensitivity and uses less memory than other commonly used local alignment tools. JA - IEEE International Conference on Bioinformatics and Biomedicine, 2009. BIBM '09 PB - IEEE SN - 978-0-7695-3885-3 ER - TY - JOUR T1 - Microbial oceanography in a sea of opportunity JF - NatureNature Y1 - 2009 A1 - Bowler, Chris A1 - Karl, David M. A1 - Rita R. Colwell KW - Astronomy KW - astrophysics KW - Biochemistry KW - Bioinformatics KW - Biology KW - biotechnology KW - cancer KW - cell cycle KW - cell signalling KW - climate change KW - Computational Biology KW - development KW - developmental biology KW - DNA KW - drug discovery KW - earth science KW - ecology KW - environmental science KW - Evolution KW - evolutionary biology KW - functional genomics KW - Genetics KW - Genomics KW - geophysics KW - immunology KW - interdisciplinary science KW - life KW - marine biology KW - materials science KW - medical research KW - medicine KW - metabolomics KW - molecular biology KW - molecular interactions KW - nanotechnology KW - Nature KW - neurobiology KW - neuroscience KW - palaeobiology KW - pharmacology KW - Physics KW - proteomics KW - quantum physics KW - RNA KW - Science KW - science news KW - science policy KW - signal transduction KW - structural biology KW - systems biology KW - transcriptomics AB - Plankton use solar energy to drive the nutrient cycles that make the planet habitable for larger organisms. We can now explore the diversity and functions of plankton using genomics, revealing the gene repertoires associated with survival in the oceans. Such studies will help us to appreciate the sensitivity of ocean systems and of the ocean's response to climate change, improving the predictive power of climate models. VL - 459 SN - 0028-0836 ER - TY - JOUR T1 - SplicePort--an interactive splice-site analysis tool. JF - Nucleic Acids Res Y1 - 2007 A1 - Dogan, Rezarta Islamaj A1 - Getoor, Lise A1 - Wilbur, W John A1 - Mount, Stephen M KW - Base Sequence KW - Chromosome mapping KW - Computational Biology KW - Computer simulation KW - DNA KW - Genome KW - HUMANS KW - Internet KW - Models, Genetic KW - Molecular Sequence Data KW - Pattern Recognition, Automated KW - RNA Splice Sites KW - sequence alignment KW - Sequence Analysis, DNA KW - User-Computer Interface AB -

SplicePort is a web-based tool for splice-site analysis that allows the user to make splice-site predictions for submitted sequences. In addition, the user can also browse the rich catalog of features that underlies these predictions, and which we have found capable of providing high classification accuracy on human splice sites. Feature selection is optimized for human splice sites, but the selected features are likely to be predictive for other mammals as well. With our interactive feature browsing and visualization tool, the user can view and explore subsets of features used in splice-site prediction (either the features that account for the classification of a specific input sequence or the complete collection of features). Selected feature sets can be searched, ranked or displayed easily. The user can group features into clusters and frequency plot WebLogos can be generated for each cluster. The user can browse the identified clusters and their contributing elements, looking for new interesting signals, or can validate previously observed signals. The SplicePort web server can be accessed at http://www.cs.umd.edu/projects/SplicePort and http://www.spliceport.org.

VL - 35 CP - Web Server issue M3 - 10.1093/nar/gkm407 ER - TY - JOUR T1 - Localization of sequences required for size-specific splicing of a small Drosophila intron in vitro. JF - J Mol Biol Y1 - 1995 A1 - Guo, M A1 - Mount, S M KW - Animals KW - Base Sequence KW - Cell Line KW - DNA KW - Drosophila KW - Genes, Insect KW - HeLa Cells KW - HUMANS KW - Introns KW - Molecular Sequence Data KW - Myosin Heavy Chains KW - RNA Splicing KW - Species Specificity AB -

Many introns in Drosophila and other invertebrates are less than 80 nucleotides in length, too small to be recognized by the vertebrate splicing machinery. Comparison of nuclear splicing extracts from human HeLa and Drosophila Kc cells has revealed species-specificity, consistent with the observed size differences. Here we present additional results with the 68 nucleotide fifth intron of the Drosophila myosin heavy chain gene. As observed with the 74 nucleotide second intron of the Drosophila white gene, the wild-type myosin intron is accurately spliced in a homologous extract, and increasing the size by 16 nucleotides both eliminates splicing in the Drosophila extract and allows accurate splicing in the human extract. In contrast to previous results, however, an upstream cryptic 5' splice site is activated when the wild-type myosin intron is tested in a human HeLa cell nuclear extract, resulting in the removal of a 98 nucleotide intron. The size dependence of splicing in Drosophila extracts is also intron-specific; we noted that a naturally larger (150 nucleotide) intron from the ftz gene is efficiently spliced in Kc cell extracts that do not splice enlarged introns (of 84, 90, 150 or 350 nucleotides) derived from the 74 nucleotide white intron. Here, we have exploited that observation, using a series of hybrid introns to show that a region of 46 nucleotides at the 3' end of the white intron is sufficient to confer the species-specific size effect. At least two sequence elements within this region, yet distinct from previously described branchpoint and pyrimidine tract signals, are required for efficient splicing of small hybrid introns in vitro.

VL - 253 CP - 3 M3 - 10.1006/jmbi.1995.0564 ER - TY - JOUR T1 - P element-mediated in vivo deletion analysis of white-apricot: deletions between direct repeats are strongly favored. JF - Genetics Y1 - 1994 A1 - Kurkulos, M A1 - Weinberg, J M A1 - Roy, D A1 - Mount, S M KW - Alleles KW - Animals KW - Animals, Genetically Modified KW - Base Sequence KW - Crosses, Genetic KW - DNA KW - DNA Transposable Elements KW - Drosophila KW - Eye Color KW - Female KW - Genes, Insect KW - Male KW - Molecular Sequence Data KW - Nucleotidyltransferases KW - PHENOTYPE KW - Recombination, Genetic KW - Repetitive Sequences, Nucleic Acid KW - Sequence Deletion KW - Transformation, Genetic KW - Transposases AB -

We have isolated and characterized deletions arising within a P transposon, P[hswa], in the presence of P transposase. P[hswa] carries white-apricot (wa) sequences, including a complete copia element, under the control of an hsp70 promoter, and resembles the original wa allele in eye color phenotype. In the presence of P transposase, P[hswa] shows a high overall rate (approximately 3%) of germline mutations that result in increased eye pigmentation. Of 234 derivatives of P[hswa] with greatly increased eye pigmentation, at least 205 carried deletions within copia. Of these, 201 were precise deletions between the directly repeated 276-nucleotide copia long terminal repeats (LTRs), and four were unique deletions. High rates of transposase-induced precise deletion were observed within another P transposon carrying unrelated 599 nucleotide repeats (yeast 2 mu FLP; recombinase target sites) separated by 5.7 kb. Our observation that P element-mediated deletion formation occurs preferentially between direct repeats suggests general methods for controlling deletion formation.

VL - 136 CP - 3 ER - TY - JOUR T1 - Species-specific signals for the splicing of a short Drosophila intron in vitro. JF - Mol Cell Biol Y1 - 1993 A1 - Guo, M A1 - Lo, P C A1 - Mount, S M KW - Animals KW - Base Sequence KW - Cell Nucleus KW - Consensus Sequence KW - DNA KW - DNA Transposable Elements KW - Drosophila KW - Drosophila Proteins KW - Electrophoresis, Polyacrylamide Gel KW - HeLa Cells KW - HUMANS KW - Introns KW - Molecular Sequence Data KW - Mutation KW - Peptide Hydrolases KW - Proteins KW - Regulatory Sequences, Nucleic Acid KW - Retroelements KW - RNA Splicing KW - Species Specificity AB -

The effects of branchpoint sequence, the pyrimidine stretch, and intron size on the splicing efficiency of the Drosophila white gene second intron were examined in nuclear extracts from Drosophila and human cells. This 74-nucleotide intron is typical of many Drosophila introns in that it lacks a significant pyrimidine stretch and is below the minimum size required for splicing in human nuclear extracts. Alteration of sequences of adjacent to the 3' splice site to create a pyrimidine stretch was necessary for splicing in human, but not Drosophila, extracts. Increasing the size of this intron with insertions between the 5' splice site and the branchpoint greatly reduced the efficiency of splicing of introns longer than 79 nucleotides in Drosophila extracts but had an opposite effect in human extracts, in which introns longer than 78 nucleotides were spliced with much greater efficiency. The white-apricot copia insertion is immediately adjacent to the branchpoint normally used in the splicing of this intron, and a copia long terminal repeat insertion prevents splicing in Drosophila, but not human, extracts. However, a consensus branchpoint does not restore the splicing of introns containing the copia long terminal repeat, and alteration of the wild-type branchpoint sequence alone does not eliminate splicing. These results demonstrate species specificity of splicing signals, particularly pyrimidine stretch and size requirements, and raise the possibility that variant mechanisms not found in mammals may operate in the splicing of small introns in Drosophila and possibly other species.

VL - 13 CP - 2 ER - TY - JOUR T1 - Sequence of a cDNA from the Drosophila melanogaster white gene. JF - Nucleic Acids Res Y1 - 1990 A1 - Pepling, M A1 - Mount, S M KW - Amino Acid Sequence KW - Animals KW - Base Sequence KW - DNA KW - Drosophila melanogaster KW - Eye Color KW - genes KW - Molecular Sequence Data VL - 18 CP - 6 ER - TY - JOUR T1 - Structure and expression of the Drosophila melanogaster gene for the U1 small nuclear ribonucleoprotein particle 70K protein. JF - Mol Cell Biol Y1 - 1990 A1 - Mancebo, R A1 - Lo, P C A1 - Mount, S M KW - Amino Acid Sequence KW - Animals KW - Base Sequence KW - Blotting, Northern KW - Blotting, Southern KW - Cloning, Molecular KW - DNA KW - Drosophila melanogaster KW - Gene expression KW - Gene Library KW - genes KW - HUMANS KW - Molecular Sequence Data KW - Molecular Weight KW - Oligonucleotide Probes KW - Poly A KW - Ribonucleoproteins KW - Ribonucleoproteins, Small Nuclear KW - RNA KW - RNA, Messenger KW - Sequence Homology, Nucleic Acid KW - Xenopus AB -

A genomic clone encoding the Drosophila U1 small nuclear ribonucleoprotein particle 70K protein was isolated by hybridization with a human U1 small nuclear ribonucleoprotein particle 70K protein cDNA. Southern blot and in situ hybridizations showed that this U1 70K gene is unique in the Drosophila genome, residing at cytological position 27D1,2. Polyadenylated transcripts of 1.9 and 3.1 kilobases were observed. While the 1.9-kilobase mRNA is always more abundant, the ratio of these two transcripts is developmentally regulated. Analysis of cDNA and genomic sequences indicated that these two RNAs encode an identical protein with a predicted molecular weight of 52,879. Comparison of the U1 70K proteins predicted from Drosophila, human, and Xenopus cDNAs revealed 68% amino acid identity in the most amino-terminal 214 amino acids, which include a sequence motif common to many proteins which bind RNA. The carboxy-terminal half is less well conserved but is highly charged and contains distinctive arginine-rich regions in all three species. These arginine-rich regions contain stretches of arginine-serine dipeptides like those found in transformer, transformer-2, and suppressor-of-white-apricot proteins, all of which have been identified as regulators of mRNA splicing in Drosophila melanogaster.

VL - 10 CP - 6 ER - TY - JOUR T1 - Pseudogenes for human small nuclear RNA U3 appear to arise by integration of self-primed reverse transcripts of the RNA into new chromosomal sites. JF - Cell Y1 - 1983 A1 - Bernstein, L B A1 - Mount, S M A1 - Weiner, A M KW - Animals KW - Base Sequence KW - DNA KW - genes KW - HUMANS KW - Nucleic Acid Conformation KW - Rats KW - Recombination, Genetic KW - Repetitive Sequences, Nucleic Acid KW - RNA KW - RNA, Small Nuclear KW - RNA-Directed DNA Polymerase KW - Templates, Genetic KW - Transcription, Genetic AB -

We find that both human and rat U3 snRNA can function as self-priming templates for AMV reverse transcriptase in vitro. The 74 base cDNA is primed by the 3' end of intact U3 snRNA, and spans the characteristically truncated 69 or 70 base U3 sequence found in four different human U3 pseudogenes. The ability of human and rat U3 snRNA to self-prime is consistent with a U3 secondary structure model derived by a comparison between rat U3 snRNA and the homologous D2 snRNA from Dictyostelium discoideum. We propose that U3 pseudogenes are generated in vivo by integration of a self-primed cDNA copy of U3 snRNA at new chromosomal sites. We also consider the possibility that the same cDNA mediates gene conversion at the 5' end of bona fide U3 genes where, over the entire region spanned by the U3 cDNA, the two rat U3 sequence variants U3A and U3B are identical.

VL - 32 CP - 2 ER -