TY - Generic T1 - Epiviz: interactive visual analytics for functional genomics data. Y1 - 2014 A1 - Chelaru, Florin A1 - Smith, Llewellyn A1 - Goldstein, Naomi A1 - Bravo, Héctor Corrada KW - algorithms KW - Chromosome mapping KW - Data Mining KW - database management systems KW - Databases, Genetic KW - Genomics KW - Internet KW - software KW - User-Computer Interface AB -

Visualization is an integral aspect of genomics data analysis. Algorithmic-statistical analysis and interactive visualization are most effective when used iteratively. Epiviz (http://epiviz.cbcb.umd.edu/), a web-based genome browser, and the Epivizr Bioconductor package allow interactive, extensible and reproducible visualization within a state-of-the-art data-analysis platform.

JA - Nat Methods VL - 11 CP - 9 M3 - 10.1038/nmeth.3038 ER - TY - JOUR T1 - TIGRFAMs and Genome Properties in 2013 JF - Nucleic acids researchNucleic Acids Research Y1 - 2013 A1 - Haft, Daniel H. A1 - J. Selengut A1 - Richter, Roland A. A1 - Harkins, Derek A1 - Basu, Malay K. A1 - Beck, Erin KW - Databases, Protein KW - Genome, Archaeal KW - Genome, Bacterial KW - Genomics KW - Internet KW - Markov chains KW - Molecular Sequence Annotation KW - Proteins KW - sequence alignment AB - TIGRFAMs, available online at http://www.jcvi.org/tigrfams is a database of protein family definitions. Each entry features a seed alignment of trusted representative sequences, a hidden Markov model (HMM) built from that alignment, cutoff scores that let automated annotation pipelines decide which proteins are members, and annotations for transfer onto member proteins. Most TIGRFAMs models are designated equivalog, meaning they assign a specific name to proteins conserved in function from a common ancestral sequence. Models describing more functionally heterogeneous families are designated subfamily or domain, and assign less specific but more widely applicable annotations. The Genome Properties database, available at http://www.jcvi.org/genome-properties, specifies how computed evidence, including TIGRFAMs HMM results, should be used to judge whether an enzymatic pathway, a protein complex or another type of molecular subsystem is encoded in a genome. TIGRFAMs and Genome Properties content are developed in concert because subsystems reconstruction for large numbers of genomes guides selection of seed alignment sequences and cutoff values during protein family construction. Both databases specialize heavily in bacterial and archaeal subsystems. At present, 4284 models appear in TIGRFAMs, while 628 systems are described by Genome Properties. Content derives both from subsystem discovery work and from biocuration of the scientific literature. VL - 41 N1 - http://www.ncbi.nlm.nih.gov/pubmed/23197656?dopt=Abstract ER - TY - JOUR T1 - TIGRFAMs and Genome Properties in 2013. JF - Nucleic Acids Res Y1 - 2013 A1 - Haft, Daniel H A1 - Selengut, Jeremy D A1 - Richter, Roland A A1 - Harkins, Derek A1 - Basu, Malay K A1 - Beck, Erin KW - Databases, Protein KW - Genome, Archaeal KW - Genome, Bacterial KW - Genomics KW - Internet KW - Markov chains KW - Molecular Sequence Annotation KW - Proteins KW - sequence alignment AB -

TIGRFAMs, available online at http://www.jcvi.org/tigrfams is a database of protein family definitions. Each entry features a seed alignment of trusted representative sequences, a hidden Markov model (HMM) built from that alignment, cutoff scores that let automated annotation pipelines decide which proteins are members, and annotations for transfer onto member proteins. Most TIGRFAMs models are designated equivalog, meaning they assign a specific name to proteins conserved in function from a common ancestral sequence. Models describing more functionally heterogeneous families are designated subfamily or domain, and assign less specific but more widely applicable annotations. The Genome Properties database, available at http://www.jcvi.org/genome-properties, specifies how computed evidence, including TIGRFAMs HMM results, should be used to judge whether an enzymatic pathway, a protein complex or another type of molecular subsystem is encoded in a genome. TIGRFAMs and Genome Properties content are developed in concert because subsystems reconstruction for large numbers of genomes guides selection of seed alignment sequences and cutoff values during protein family construction. Both databases specialize heavily in bacterial and archaeal subsystems. At present, 4284 models appear in TIGRFAMs, while 628 systems are described by Genome Properties. Content derives both from subsystem discovery work and from biocuration of the scientific literature.

VL - 41 CP - Database issue M3 - 10.1093/nar/gks1234 ER - TY - Generic T1 - Computing the Tree of Life: Leveraging the Power of Desktop and Service Grids T2 - Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Y1 - 2011 A1 - Adam L. Bazinet A1 - Michael P. Cummings KW - (artificial KW - (mathematics) KW - analysis KW - BOINC KW - COMPUTATION KW - computational KW - computing KW - data KW - Estimation KW - evolutionary KW - GARLI KW - genetic KW - Grid KW - GRIDS KW - handling KW - heterogeneous KW - History KW - HPC KW - information KW - intelligence) KW - interface KW - interfaces KW - Internet KW - jobs KW - lattice KW - learning KW - life KW - likelihood KW - load KW - machine KW - maximum KW - method KW - model KW - molecular KW - phylogenetic KW - portal KW - Portals KW - power KW - project KW - resource KW - Science KW - sequence KW - service KW - services KW - sets KW - software KW - substantial KW - system KW - systematics KW - tree KW - TREES KW - user KW - Web AB - The trend in life sciences research, particularly in molecular evolutionary systematics, is toward larger data sets and ever-more detailed evolutionary models, which can generate substantial computational loads. Over the past several years we have developed a grid computing system aimed at providing researchers the computational power needed to complete such analyses in a timely manner. Our grid system, known as The Lattice Project, was the first to combine two models of grid computing - the service model, which mainly federates large institutional HPC resources, and the desktop model, which harnesses the power of PCs volunteered by the general public. Recently we have developed a "science portal" style web interface that makes it easier than ever for phylogenetic analyses to be completed using GARLI, a popular program that uses a maximum likelihood method to infer the evolutionary history of organisms on the basis of genetic sequence data. This paper describes our approach to scheduling thousands of GARLI jobs with diverse requirements to heterogeneous grid resources, which include volunteer computers running BOINC software. A key component of this system provides a priori GARLI runtime estimates using machine learning with random forests. JA - Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on ER - TY - JOUR T1 - New developments in the InterPro database. JF - Nucleic Acids Res Y1 - 2007 A1 - Mulder, Nicola J A1 - Apweiler, Rolf A1 - Attwood, Teresa K A1 - Bairoch, Amos A1 - Bateman, Alex A1 - Binns, David A1 - Bork, Peer A1 - Buillard, Virginie A1 - Cerutti, Lorenzo A1 - Copley, Richard A1 - Courcelle, Emmanuel A1 - Das, Ujjwal A1 - Daugherty, Louise A1 - Dibley, Mark A1 - Finn, Robert A1 - Fleischmann, Wolfgang A1 - Gough, Julian A1 - Haft, Daniel A1 - Hulo, Nicolas A1 - Hunter, Sarah A1 - Kahn, Daniel A1 - Kanapin, Alexander A1 - Kejariwal, Anish A1 - Labarga, Alberto A1 - Langendijk-Genevaux, Petra S A1 - Lonsdale, David A1 - Lopez, Rodrigo A1 - Letunic, Ivica A1 - Madera, Martin A1 - Maslen, John A1 - McAnulla, Craig A1 - McDowall, Jennifer A1 - Mistry, Jaina A1 - Mitchell, Alex A1 - Nikolskaya, Anastasia N A1 - Orchard, Sandra A1 - Orengo, Christine A1 - Petryszak, Robert A1 - Selengut, Jeremy D A1 - Sigrist, Christian J A A1 - Thomas, Paul D A1 - Valentin, Franck A1 - Wilson, Derek A1 - Wu, Cathy H A1 - Yeats, Corin KW - Databases, Protein KW - Internet KW - Protein Structure, Tertiary KW - Proteins KW - Sequence Analysis, Protein KW - Systems Integration KW - User-Computer Interface AB -

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.

VL - 35 CP - Database issue M3 - 10.1093/nar/gkl841 ER - TY - JOUR T1 - SplicePort--an interactive splice-site analysis tool. JF - Nucleic Acids Res Y1 - 2007 A1 - Dogan, Rezarta Islamaj A1 - Getoor, Lise A1 - Wilbur, W John A1 - Mount, Stephen M KW - Base Sequence KW - Chromosome mapping KW - Computational Biology KW - Computer simulation KW - DNA KW - Genome KW - HUMANS KW - Internet KW - Models, Genetic KW - Molecular Sequence Data KW - Pattern Recognition, Automated KW - RNA Splice Sites KW - sequence alignment KW - Sequence Analysis, DNA KW - User-Computer Interface AB -

SplicePort is a web-based tool for splice-site analysis that allows the user to make splice-site predictions for submitted sequences. In addition, the user can also browse the rich catalog of features that underlies these predictions, and which we have found capable of providing high classification accuracy on human splice sites. Feature selection is optimized for human splice sites, but the selected features are likely to be predictive for other mammals as well. With our interactive feature browsing and visualization tool, the user can view and explore subsets of features used in splice-site prediction (either the features that account for the classification of a specific input sequence or the complete collection of features). Selected feature sets can be searched, ranked or displayed easily. The user can group features into clusters and frequency plot WebLogos can be generated for each cluster. The user can browse the identified clusters and their contributing elements, looking for new interesting signals, or can validate previously observed signals. The SplicePort web server can be accessed at http://www.cs.umd.edu/projects/SplicePort and http://www.spliceport.org.

VL - 35 CP - Web Server issue M3 - 10.1093/nar/gkm407 ER - TY - JOUR T1 - TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes JF - Nucleic acids researchNucleic Acids Research Y1 - 2007 A1 - J. Selengut A1 - Haft, Daniel H. A1 - Davidsen, Tanja A1 - Ganapathy, Anurhada A1 - Gwinn-Giglio, Michelle A1 - Nelson, William C. A1 - Richter, R. Alexander A1 - White, Owen KW - Archaeal Proteins KW - Bacterial Proteins KW - Databases, Protein KW - Genome, Bacterial KW - Genomics KW - Internet KW - Phylogeny KW - software KW - User-Computer Interface AB - TIGRFAMs is a collection of protein family definitions built to aid in high-throughput annotation of specific protein functions. Each family is based on a hidden Markov model (HMM), where both cutoff scores and membership in the seed alignment are chosen so that the HMMs can classify numerous proteins according to their specific molecular functions. Most TIGRFAMs models describe 'equivalog' families, where both orthology and lateral gene transfer may be part of the evolutionary history, but where a single molecular function has been conserved. The Genome Properties system contains a queriable set of metabolic reconstructions, genome metrics and extractions of information from the scientific literature. Its genome-by-genome assertions of whether or not specific structures, pathways or systems are present provide high-level conceptual descriptions of genomic content. These assertions enable comparative genomics, provide a meaningful biological context to aid in manual annotation, support assignments of Gene Ontology (GO) biological process terms and help validate HMM-based predictions of protein function. The Genome Properties system is particularly useful as a generator of phylogenetic profiles, through which new protein family functions may be discovered. The TIGRFAMs and Genome Properties systems can be accessed at http://www.tigr.org/TIGRFAMs and http://www.tigr.org/Genome_Properties. VL - 35 N1 - http://www.ncbi.nlm.nih.gov/pubmed/17151080?dopt=Abstract ER -