TY - JOUR T1 - Capturing the most wanted taxa through cross-sample correlations JF - The ISME Journal Y1 - 2016 A1 - Almeida, Mathieu A1 - Pop, Mihai A1 - Le Chatelier, Emmanuelle A1 - Prifti, Edi A1 - Pons, Nicolas A1 - Ghozlane, Amine A1 - Ehrlich, S Dusko UR - http://www.nature.com/doifinder/10.1038/ismej.2016.35 J1 - ISME J M3 - 10.1038/ismej.2016.35 ER - TY - CONF T1 - Chromatin and genomic determinants of alternative splicing T2 - BCB '15 Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics Y1 - 2015 A1 - Kun Wang A1 - Kan Cao A1 - Sridhar Hannenhalli JA - BCB '15 Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics PB - ACM ER - TY - CONF T1 - Computational challenges in microbiome research T2 - 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Y1 - 2015 A1 - Pop, Mihai JA - 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) PB - IEEE CY - Washington, DC, USA UR - http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7359645 M3 - 10.1109/BIBM.2015.7359645 ER - TY - JOUR T1 - Complete genome sequence of the quality control strain Staphylococcus aureus subsp. aureus ATCC 25923 JF - Genome announcements Y1 - 2014 A1 - Treangen, Todd J A1 - Maybank, Rosslyn A A1 - Enke, Sana A1 - Friss, Mary Beth A1 - Diviak, Lynn F A1 - Karaolis, David KR A1 - Koren, Sergey A1 - Ondov, Brian A1 - Phillippy, Adam M A1 - Bergman, Nicholas H VL - 2 ER - TY - JOUR T1 - Computational methods for optical mapping JF - GigaScienceGigaScience Y1 - 2014 A1 - Mendelowitz, Lee A1 - Pop, Mihai AB - Optical mapping and newer genome mapping technologies based on nicking enzymes provide low resolution but long-range genomic information. The optical mapping technique has been successfully used for assessing the quality of genome assemblies and for detecting large-scale structural variants and rearrangements that cannot be detected using current paired end sequencing protocols. Here, we review several algorithms and methods for building consensus optical maps and aligning restriction patterns to a reference map, as well as methods for using optical maps with sequence assemblies. VL - 3 SN - 2047-217X U2 - 4323141 ER - TY - JOUR T1 - A computational study of the Warburg effect identifies metabolic targets inhibiting cancer migration. JF - Mol Syst Biol Y1 - 2014 A1 - Yizhak, Keren A1 - Le Dévédec, Sylvia E A1 - Rogkoti, Vasiliki Maria A1 - Baenke, Franziska A1 - de Boer, Vincent C A1 - Frezza, Christian A1 - Schulze, Almut A1 - van de Water, Bob A1 - Ruppin, Eytan AB -

Over the last decade, the field of cancer metabolism has mainly focused on studying the role of tumorigenic metabolic rewiring in supporting cancer proliferation. Here, we perform the first genome‐scale computational study of the metabolic underpinnings of cancer migration. We build genome‐scale metabolic models of the NCI‐60 cell lines that capture the Warburg effect (aerobic glycolysis) typically occurring in cancer cells. The extent of the Warburg effect in each of these cell line models is quantified by the ratio of glycolytic to oxidative ATP flux (AFR), which is found to be highly positively associated with cancer cell migration. We hence predicted that targeting genes that mitigate the Warburg effect by reducing the AFR may specifically inhibit cancer migration. By testing the anti‐migratory effects of silencing such 17 top predicted genes in four breast and lung cancer cell lines, we find that up to 13 of these novel predictions significantly attenuate cell migration either in all or one cell line only, while having almost no effect on cell proliferation. Furthermore, in accordance with the predictions, a significant reduction is observed in the ratio between experimentally measured ECAR and OCR levels following these perturbations. Inhibiting anti‐migratory targets is a promising future avenue in treating cancer since it may decrease cytotoxic‐related side effects that plague current anti‐proliferative treatments. Furthermore, it may reduce cytotoxic‐related clonal selection of more aggressive cancer cells and the likelihood of emerging resistance.

VL - 10 M3 - 10.15252/msb.20145746 ER - TY - Generic T1 - Conservation in first introns is positively associated with the number of exons within genes and the presence of regulatory epigenetic signals Y1 - 2014 A1 - Park, Seung A1 - Hannenhalli, Sridhar A1 - Choi, Sun JA - BMC Genomics VL - 15 UR - http://www.biomedcentral.com/1471-2164/15/526 CP - 1 J1 - BMC GenomicsBMC Genomics M3 - 10.1186/1471-2164-15-526 ER - TY - JOUR T1 - Construction of a dairy microbial genome catalog opens new perspectives for the metagenomic analysis of dairy fermented products JF - BMC GenomicsBMC Genomics Y1 - 2014 A1 - Almeida, Mathieu A1 - Hebert, Agnes A1 - Abraham, Anne-Laure A1 - Rasmussen, Simon A1 - Monnet, Christophe A1 - Pons, Nicolas A1 - Delbes, Celine A1 - Loux, Valentin A1 - Batto, Jean-Michel A1 - Leonard, Pierre A1 - Kennedy, Sean A1 - Ehrlich, Stanislas A1 - Pop, Mihai A1 - Montel, Marie-Christine A1 - Irlinger, Francoise A1 - Renault, Pierre AB - BACKGROUND:Microbial communities of traditional cheeses are complex and insufficiently characterized. The origin, safety and functional role in cheese making of these microbial communities are still not well understood. Metagenomic analysis of these communities by high throughput shotgun sequencing is a promising approach to characterize their genomic and functional profiles. Such analyses, however, critically depend on the availability of appropriate reference genome databases against which the sequencing reads can be aligned.RESULTS:We built a reference genome catalog suitable for short read metagenomic analysis using a low-cost sequencing strategy. We selected 142 bacteria isolated from dairy products belonging to 137 different species and 67 genera, and succeeded to reconstruct the draft genome of 117 of them at a standard or high quality level, including isolates from the genera Kluyvera, Luteococcus and Marinilactibacillus, still missing from public database. To demonstrate the potential of this catalog, we analysed the microbial composition of the surface of two smear cheeses and one blue-veined cheese, and showed that a significant part of the microbiota of these traditional cheeses was composed of microorganisms newly sequenced in our study.CONCLUSIONS:Our study provides data, which combined with publicly available genome references, represents the most expansive catalog to date of cheese-associated bacteria. Using this extended dairy catalog, we revealed the presence in traditional cheese of dominant microorganisms not deliberately inoculated, mainly Gram-negative genera such as Pseudoalteromonas haloplanktis or Psychrobacter immobilis, that may contribute to the characteristics of cheese produced through traditional methods. VL - 15 SN - 1471-2164 ER - TY - JOUR T1 - CTCF binding site sequence differences are associated with unique regulatory and functional trends during embryonic stem cell differentiation JF - Nucleic Acids ResNucleic Acids ResNucleic Acids Res Y1 - 2014 A1 - Plasschaert, R. N. A1 - Vigneau, S. A1 - Tempera, I. A1 - Gupta, R. A1 - Maksimoska, J. A1 - Everett, L. A1 - Davuluri, R. A1 - Mamorstein, R. A1 - Lieberman, P. M. A1 - Schultz, D. A1 - Sridhar Hannenhalli A1 - Bartolomei, M. S. KW - *Gene Expression Regulation KW - *Regulatory Elements, Transcriptional KW - Animals KW - Binding Sites KW - Cell Differentiation/*genetics KW - Cells, Cultured KW - Embryonic Stem Cells/cytology/*metabolism KW - Mice KW - Nucleotide Motifs KW - Protein Binding KW - Repressor Proteins/*metabolism AB - CTCF (CCCTC-binding factor) is a highly conserved multifunctional DNA-binding protein with thousands of binding sites genome-wide. Our previous work suggested that differences in CTCF's binding site sequence may affect the regulation of CTCF recruitment and its function. To investigate this possibility, we characterized changes in genome-wide CTCF binding and gene expression during differentiation of mouse embryonic stem cells. After separating CTCF sites into three classes (LowOc, MedOc and HighOc) based on similarity to the consensus motif, we found that developmentally regulated CTCF binding occurs preferentially at LowOc sites, which have lower similarity to the consensus. By measuring the affinity of CTCF for selected sites, we show that sites lost during differentiation are enriched in motifs associated with weaker CTCF binding in vitro. Specifically, enrichment for T at the 18(th) position of the CTCF binding site is associated with regulated binding in the LowOc class and can predictably reduce CTCF affinity for binding sites. Finally, by comparing changes in CTCF binding with changes in gene expression during differentiation, we show that LowOc and HighOc sites are associated with distinct regulatory functions. Our results suggest that the regulatory control of CTCF is dependent in part on specific motifs within its binding site. VL - 42 SN - 1362-4962 (Electronic)
0305-1048 (Linking) N1 - Plasschaert, Robert N
Vigneau, Sebastien
Tempera, Italo
Gupta, Ravi
Maksimoska, Jasna
Everett, Logan
Davuluri, Ramana
Mamorstein, Ronen
Lieberman, Paul M
Schultz, David
Hannenhalli, Sridhar
Bartolomei, Marisa S
eng
K99AI099153/AI/NIAID NIH HHS/
P30 CA10815/CA/NCI NIH HHS/
R01 CA140652/CA/NCI NIH HHS/
R01-GM052880/GM/NIGMS NIH HHS/
R01CA140652/CA/NCI NIH HHS/
R01GM085226/GM/NIGMS NIH HHS/
R01HD042026/HD/NICHD NIH HHS/
T32GM008216/GM/NIGMS NIH HHS/
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
England
2013/10/15 06:00
Nucleic Acids Res. 2014 Jan;42(2):774-89. doi: 10.1093/nar/gkt910. Epub 2013 Oct 10. U2 - 3902912 J1 - Nucleic acids researchNucleic acids research ER - TY - JOUR T1 - Can RNA-Seq resolve the rapid radiation of advanced moths and butterflies (Hexapoda: Lepidoptera: Apoditrysia)? An exploratory study JF - PLoS One Y1 - 2013 A1 - Adam L. Bazinet A1 - Michael P. Cummings A1 - Mitter, Kim T. A1 - Mitter, Charles W. AB -

Recent molecular phylogenetic studies of the insect order Lepidoptera have robustly resolved family-level divergences within most superfamilies, and most divergences among the relatively species-poor early-arising superfamilies. In sharp contrast, relationships among the superfamilies of more advanced moths and butterflies that comprise the mega-diverse clade Apoditrysia (ca. 145,000 spp.) remain mostly poorly supported. This uncertainty, in turn, limits our ability to discern the origins, ages and evolutionary consequences of traits hypothesized to promote the spectacular diversification of Apoditrysia. Low support along the apoditrysian "backbone" probably reflects rapid diversification. If so, it may be feasible to strengthen resolution by radically increasing the gene sample, but case studies have been few. We explored the potential of next-generation sequencing to conclusively resolve apoditrysian relationships. We used transcriptome RNA-Seq to generate 1579 putatively orthologous gene sequences across a broad sample of 40 apoditrysians plus four outgroups, to which we added two taxa from previously published data. Phylogenetic analysis of a 46-taxon, 741-gene matrix, resulting from a strict filter that eliminated ortholog groups containing any apparent paralogs, yielded dramatic overall increase in bootstrap support for deeper nodes within Apoditrysia as compared to results from previous and concurrent 19-gene analyses. High support was restricted mainly to the huge subclade Obtectomera broadly defined, in which 11 of 12 nodes subtending multiple superfamilies had bootstrap support of 100%. The strongly supported nodes showed little conflict with groupings from previous studies, and were little affected by changes in taxon sampling, suggesting that they reflect true signal rather than artifacts of massive gene sampling. In contrast, strong support was seen at only 2 of 11 deeper nodes among the "lower", non-obtectomeran apoditrysians. These represent a much harder phylogenetic problem, for which one path to resolution might include further increase in gene sampling, together with improved orthology assignments.

VL - 8 ER - TY - JOUR T1 - Contribution of nucleosome binding preferences and co-occurring DNA sequences to transcription factor binding JF - BMC Genomics Y1 - 2013 A1 - He, Ximiao A1 - Chatterjee, Raghunath A1 - John, Sam A1 - Bravo, Hector A1 - Sathyanarayana, B K A1 - Biddie, Simon C A1 - FitzGerald, Peter C A1 - Stamatoyannopoulos, John A A1 - Hager, Gordon L A1 - Vinson, Charles VL - 14 UR - http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-14-428 CP - 1 J1 - BMC GenomicsBMC Genomics M3 - 10.1186/1471-2164-14-428 ER - TY - Generic T1 - Correlated evolution of positions within mammalian cis elements Y1 - 2013 A1 - R. Mukherjee A1 - L. N. S. Singh A1 - Evans, P. A1 - Sridhar Hannenhalli JA - Plos One VL - 8 ER - TY - JOUR T1 - A comparative evaluation of sequence classification programs JF - BMC BioinformaticsBMC Bioinformatics Y1 - 2012 A1 - Adam L. Bazinet A1 - Michael P. Cummings AB - Background A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. Results We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. Conclusions We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs. VL - 13 ER - TY - JOUR T1 - Can Deliberately Incomplete Gene Sample Augmentation Improve a Phylogeny Estimate for the Advanced Moths and Butterflies (Hexapoda: Lepidoptera)? JF - Systematic BiologySyst BiolSystematic BiologySyst Biol Y1 - 2011 A1 - Cho, Soowon A1 - Zwick, Andreas A1 - Regier, Jerome C. A1 - Mitter, Charles A1 - Michael P. Cummings A1 - Yao, Jianxiu A1 - Du, Zaile A1 - Zhao, Hong A1 - Kawahara, Akito Y. A1 - Weller, Susan A1 - Davis, Donald R. A1 - Baixeras, Joaquin A1 - Brown, John W. A1 - Parr, Cynthia KW - Ditrysia KW - gene sampling KW - Hexapoda KW - Lepidoptera KW - missing data KW - molecular phylogenetics KW - nuclear genes KW - taxon sampling AB - This paper addresses the question of whether one can economically improve the robustness of a molecular phylogeny estimate by increasing gene sampling in only a subset of taxa, without having the analysis invalidated by artifacts arising from large blocks of missing data. Our case study stems from an ongoing effort to resolve poorly understood deeper relationships in the large clade Ditrysia ( > 150,000 species) of the insect order Lepidoptera (butterflies and moths). Seeking to remedy the overall weak support for deeper divergences in an initial study based on five nuclear genes (6.6 kb) in 123 exemplars, we nearly tripled the total gene sample (to 26 genes, 18.4 kb) but only in a third (41) of the taxa. The resulting partially augmented data matrix (45% intentionally missing data) consistently increased bootstrap support for groupings previously identified in the five-gene (nearly) complete matrix, while introducing no contradictory groupings of the kind that missing data have been predicted to produce. Our results add to growing evidence that data sets differing substantially in gene and taxon sampling can often be safely and profitably combined. The strongest overall support for nodes above the family level came from including all nucleotide changes, while partitioning sites into sets undergoing mostly nonsynonymous versus mostly synonymous change. In contrast, support for the deepest node for which any persuasive molecular evidence has yet emerged (78–85% bootstrap) was weak or nonexistent unless synonymous change was entirely excluded, a result plausibly attributed to compositional heterogeneity. This node (Gelechioidea + Apoditrysia), tentatively proposed by previous authors on the basis of four morphological synapomorphies, is the first major subset of ditrysian superfamilies to receive strong statistical support in any phylogenetic study. A “more-genes-only” data set (41 taxa×26 genes) also gave strong signal for a second deep grouping (Macrolepidoptera) that was obscured, but not strongly contradicted, in more taxon-rich analyses. VL - 60 SN - 1063-5157, 1076-836X ER - TY - JOUR T1 - Clonal transmission, dual peak, and off-season cholera in Bangladesh JF - Infection Ecology & EpidemiologyInfection Ecology & Epidemiology Y1 - 2011 A1 - Alam, M. A1 - Islam, A. A1 - Bhuiyan, N. A. A1 - Rahim, N. A1 - Hossain, A. A1 - Khan, G. Y. A1 - Ahmed, D. A1 - Watanabe, H. A1 - Izumiya, H. A1 - Faruque, A. S. G. A1 - Rita R. Colwell VL - 1 ER - TY - JOUR T1 - Complete Columbian mammoth mitogenome suggests interbreeding with woolly mammoths JF - Genome biology Y1 - 2011 A1 - Enk, Jacob A1 - Devault, Alison A1 - Debruyne, Regis A1 - King, Christine E A1 - Todd Treangen A1 - O'Rourke, Dennis A1 - Salzberg, Steven L A1 - Fisher, Daniel A1 - MacPhee, Ross A1 - Poinar, Hendrik VL - 12 ER - TY - JOUR T1 - A computational statistics approach for estimating the spatial range of morphogen gradients. JF - Development Y1 - 2011 A1 - Kanodia, Jitendra S A1 - Kim, Yoosik A1 - Tomer, Raju A1 - Khan, Zia A1 - Chung, Kwanghun A1 - Storey, John D A1 - Lu, Hang A1 - Keller, Philipp J A1 - Shvartsman, Stanislav Y KW - Animals KW - Biostatistics KW - Cleavage Stage, Ovum KW - Computational Biology KW - Computer simulation KW - Drosophila KW - Drosophila Proteins KW - Embryo, Nonmammalian KW - Gene Expression Regulation, Developmental KW - Genes, Developmental KW - Imaging, Three-Dimensional KW - In Situ Hybridization, Fluorescence KW - Morphogenesis KW - Osmolar Concentration KW - Tissue Distribution AB -

A crucial issue in studies of morphogen gradients relates to their range: the distance over which they can act as direct regulators of cell signaling, gene expression and cell differentiation. To address this, we present a straightforward statistical framework that can be used in multiple developmental systems. We illustrate the developed approach by providing a point estimate and confidence interval for the spatial range of the graded distribution of nuclear Dorsal, a transcription factor that controls the dorsoventral pattern of the Drosophila embryo.

VL - 138 CP - 22 M3 - 10.1242/dev.071571 ER - TY - Generic T1 - A computational statistics approach for estimating the spatial range of morphogen gradients Y1 - 2011 A1 - Kanodia, J. S. A1 - Kim, Y. A1 - Tomer, R. A1 - Khan, Z. A1 - Chung, K. A1 - Storey, J. D. A1 - Lu, H. A1 - Keller, P. J. A1 - Shvartsman, S. Y. JA - Development VL - 138 UR - http://dev.biologists.org/cgi/doi/10.1242/dev.071571 CP - 22 J1 - Development M3 - 10.1242/dev.071571 ER - TY - Generic T1 - Computing the Tree of Life: Leveraging the Power of Desktop and Service Grids T2 - Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Y1 - 2011 A1 - Adam L. Bazinet A1 - Michael P. Cummings KW - (artificial KW - (mathematics) KW - analysis KW - BOINC KW - COMPUTATION KW - computational KW - computing KW - data KW - Estimation KW - evolutionary KW - GARLI KW - genetic KW - Grid KW - GRIDS KW - handling KW - heterogeneous KW - History KW - HPC KW - information KW - intelligence) KW - interface KW - interfaces KW - Internet KW - jobs KW - lattice KW - learning KW - life KW - likelihood KW - load KW - machine KW - maximum KW - method KW - model KW - molecular KW - phylogenetic KW - portal KW - Portals KW - power KW - project KW - resource KW - Science KW - sequence KW - service KW - services KW - sets KW - software KW - substantial KW - system KW - systematics KW - tree KW - TREES KW - user KW - Web AB - The trend in life sciences research, particularly in molecular evolutionary systematics, is toward larger data sets and ever-more detailed evolutionary models, which can generate substantial computational loads. Over the past several years we have developed a grid computing system aimed at providing researchers the computational power needed to complete such analyses in a timely manner. Our grid system, known as The Lattice Project, was the first to combine two models of grid computing - the service model, which mainly federates large institutional HPC resources, and the desktop model, which harnesses the power of PCs volunteered by the general public. Recently we have developed a "science portal" style web interface that makes it easier than ever for phylogenetic analyses to be completed using GARLI, a popular program that uses a maximum likelihood method to infer the evolutionary history of organisms on the basis of genetic sequence data. This paper describes our approach to scheduling thousands of GARLI jobs with diverse requirements to heterogeneous grid resources, which include volunteer computers running BOINC software. A key component of this system provides a priori GARLI runtime estimates using machine learning with random forests. JA - Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on ER - TY - JOUR T1 - A cost-aggregating integer linear program for motif finding JF - Journal of Discrete AlgorithmsJournal of Discrete Algorithms Y1 - 2011 A1 - Kingsford, Carl A1 - Zaslavsky, Elena A1 - Singh, Mona KW - Computational Biology KW - Integer linear programming KW - Sequence motif finding AB - In the motif finding problem one seeks a set of mutually similar substrings within a collection of biological sequences. This is an important and widely-studied problem, as such shared motifs in DNA often correspond to regulatory elements. We study a combinatorial framework where the goal is to find substrings of a given length such that the sum of their pairwise distances is minimized. We describe a novel integer linear program for the problem, which uses the fact that distances between substrings come from a limited set of possibilities allowing for aggregate consideration of sequence position pairs with the same distances. We show how to tighten its linear programming relaxation by adding an exponential set of constraints and give an efficient separation algorithm that can find violated constraints, thereby showing that the tightened linear program can still be solved in polynomial time. We apply our approach to find optimal solutions for the motif finding problem and show that it is effective in practice in uncovering known transcription factor binding sites. VL - 9 SN - 1570-8667 ER - TY - JOUR T1 - Comparative genomic analysis reveals evidence of two novel Vibrio species closely related to V. cholerae JF - BMC MicrobiologyBMC Microbiology Y1 - 2010 A1 - Bradd, H. A1 - Christopher, G. A1 - Nur, H. A1 - Seon-Young, C. A1 - Jongsik, C. A1 - Thomas, B. A1 - David, B. A1 - Jean, C. A1 - Chris, D. J. A1 - Cliff, H. A1 - Rita R. Colwell AB - In recent years genome sequencing has been used to characterize new bacterial species, a method of analysis available as a result of improved methodology and reduced cost. Included in a constantly expanding list of Vibrio species are several that have been reclassified as novel members of the Vibrionaceae. The description of two putative new Vibrio species, Vibrio sp. RC341 and Vibrio sp. RC586 for which we propose the names V. metecus and V. parilis, respectively, previously characterized as non-toxigenic environmental variants of V. cholerae is presented in this study. Results Based on results of whole-genome average nucleotide identity (ANI), average amino acid identity (AAI), rpoB similarity, MLSA, and phylogenetic analysis, the new species are concluded to be phylogenetically closely related to V. cholerae and V. mimicus. Vibrio sp. RC341 and Vibrio sp. RC586 demonstrate features characteristic of V. cholerae and V. mimicus, respectively, on differential and selective media, but their genomes show a 12 to 15% divergence (88 to 85% ANI and 92 to 91% AAI) compared to the sequences of V. cholerae and V. mimicus genomes (ANI <95% and AAI <96% indicative of separate species). Vibrio sp. RC341 and Vibrio sp. RC586 share 2104 ORFs (59%) and 2058 ORFs (56%) with the published core genome of V. cholerae and 2956 (82%) and 3048 ORFs (84%) with V. mimicus MB-451, respectively. The novel species share 2926 ORFs with each other (81% Vibrio sp. RC341 and 81% Vibrio sp. RC586). Virulence-associated factors and genomic islands of V. cholerae and V. mimicus, including VSP-I and II, were found in these environmental Vibrio spp. Conclusions Results of this analysis demonstrate these two environmental vibrios, previously characterized as variant V. cholerae strains, are new species which have evolved from ancestral lineages of the V. cholerae and V. mimicus clade. The presence of conserved integration loci for genomic islands as well as evidence of horizontal gene transfer between these two new species, V. cholerae, and V. mimicus suggests genomic islands and virulence factors are transferred between these species. VL - 10 ER - TY - JOUR T1 - Comparative Genomics of Clinical and Environmental Vibrio Mimicus JF - Proceedings of the National Academy of SciencesPNASProceedings of the National Academy of SciencesPNAS Y1 - 2010 A1 - Hasan, Nur A. A1 - Grim, Christopher J. A1 - Haley, Bradd J. A1 - Jongsik, Chun A1 - Alam, Munirul A1 - Taviani, Elisa A1 - Mozammel, Hoq A1 - Munk, A. Christine A1 - Rita R. Colwell AB - Whether Vibrio mimicus is a variant of Vibrio cholerae or a separate species has been the subject of taxonomic controversy. A genomic analysis was undertaken to resolve the issue. The genomes of V. mimicus MB451, a clinical isolate, and VM223, an environmental isolate, comprise ca. 4,347,971 and 4,313,453 bp and encode 3,802 and 3,290 ORFs, respectively. As in other vibrios, chromosome I (C-I) predominantly contains genes necessary for growth and viability, whereas chromosome II (C-II) bears genes for adaptation to environmental change. C-I harbors many virulence genes, including some not previously reported in V. mimicus, such as mannose-sensitive hemagglutinin (MSHA), and enterotoxigenic hemolysin (HlyA); C-II encodes a variant of Vibrio pathogenicity island 2 (VPI-2), and Vibrio seventh pandemic island II (VSP-II) cluster of genes. Extensive genomic rearrangement in C-II indicates it is a hot spot for evolution and genesis of speciation for the genus Vibrio. The number of virulence regions discovered in this study (VSP-II, MSHA, HlyA, type IV pilin, PilE, and integron integrase, IntI4) with no notable difference in potential virulence genes between clinical and environmental strains suggests these genes also may play a role in the environment and that pathogenic strains may arise in the environment. Significant genome synteny with prototypic pre-seventh pandemic strains of V. cholerae was observed, and the results of phylogenetic analysis support the hypothesis that, in the course of evolution, V. mimicus and V. cholerae diverged from a common ancestor with a prototypic sixth pandemic genomic backbone. VL - 107 SN - 0027-8424, 1091-6490 ER - TY - JOUR T1 - Computational Approaches for Genome Assembly Validation JF - Biological Data MiningBiological Data Mining Y1 - 2010 A1 - Choi, J. H. A1 - Tang, H. A1 - Kim, S. A1 - M. Pop ER - TY - JOUR T1 - Conversion of viable but nonculturable Vibrio cholerae to the culturable state by co‐culture with eukaryotic cells JF - Microbiology and ImmunologyMicrobiology and Immunology Y1 - 2010 A1 - Senoh, Mitsutoshi A1 - Ghosh‐Banerjee, Jayeeta A1 - Ramamurthy, Thandavarayan A1 - Hamabata, Takashi A1 - Kurakawa, Takashi A1 - Takeda, Makoto A1 - Rita R. Colwell A1 - Nair, G. Balakrish A1 - Takeda, Yoshifumi KW - conversion to culturability KW - co‐culture KW - eukaryotic cell KW - viable but nonculturable (VBNC) Vibrio cholerae AB - VBNC Vibrio cholerae O139 VC-280 obtained by incubation in 1% solution of artificial sea water IO at 4°C for 74 days converted to the culturable state when co-cultured with CHO cells. Other eukaryotic cell lines, including HT-29, Caco-2, T84, HeLa, and Intestine 407, also supported conversion of VBNC cells to the culturable state. Conversion of VBNC V. cholerae O1 N16961 and V. cholerae O139 VC-280/pG13 to the culturable state, under the same conditions, was also confirmed. When VBNC V. cholerae O139 VC-280 was incubated in 1% IO at 4°C for up to 91 days, the number of cells converted by co-culture with CHO cells declined with each additional day of incubation and after 91 days conversion was not observed. VL - 54 SN - 1348-0421 ER - TY - JOUR T1 - Correlated Changes Between Regulatory Cis Elements and Condition-Specific Expression in Paralogous Gene Families JF - Nucleic Acids ResearchNucl. Acids Res.Nucleic Acids ResearchNucl. Acids Res. Y1 - 2010 A1 - Singh, Larry N. A1 - Sridhar Hannenhalli AB - Gene duplication is integral to evolution, providing novel opportunities for organisms to diversify in function. One fundamental pathway of functional diversification among initially redundant gene copies, or paralogs, is via alterations in their expression patterns. Although the mechanisms underlying expression divergence are not completely understood, transcription factor binding sites and nucleosome occupancy are known to play a significant role in the process. Previous attempts to detect genomic variations mediating expression divergence in orthologs have had limited success for two primary reasons. First, it is inherently challenging to compare expressions among orthologs due to variable trans-acting effects and second, previous studies have quantified expression divergence in terms of an overall similarity of expression profiles across multiple samples, thereby obscuring condition-specific expression changes. Moreover, the inherently inter-correlated expressions among homologs present statistical challenges, not adequately addressed in many previous studies. Using rigorous statistical tests, here we characterize the relationship between cis element divergence and condition-specific expression divergence among paralogous genes in Saccharomyces cerevisiae. In particular, among all combinations of gene family and TFs analyzed, we found a significant correlation between TF binding and the condition-specific expression patterns in over 20% of the cases. In addition, incorporating nucleosome occupancy reveals several additional correlations. For instance, our results suggest that GAL4 binding plays a major role in the expression divergence of the genes in the sugar transporter family. Our work presents a novel means of investigating the cis regulatory changes potentially mediating expression divergence in paralogous gene families under specific conditions. VL - 38 SN - 0305-1048, 1362-4962 ER - TY - JOUR T1 - Comparative genomics reveals mechanism for short-term and long-term clonal transitions in pandemic Vibrio cholerae JF - Proceedings of the National Academy of SciencesProceedings of the National Academy of Sciences Y1 - 2009 A1 - Chun, J. A1 - Grim, C. J. A1 - Hasan, N. A. A1 - Lee, J. H. A1 - Choi, S. Y. A1 - Haley, B. J. A1 - Taviani, E. A1 - Jeon, Y. S. A1 - Kim, D. W. A1 - Lee, J. H. A1 - Rita R. Colwell AB - Vibrio cholerae, the causative agent of cholera, is a bacterium autochthonous to the aquatic environment, and a serious public health threat. V. cholerae serogroup O1 is responsible for the previous two cholera pandemics, in which classical and El Tor biotypes were dominant in the sixth and the current seventh pandemics, respectively. Cholera researchers continually face newly emerging and reemerging pathogenic clones carrying diverse combinations of phenotypic and genotypic properties, which significantly hampered control of the disease. To elucidate evolutionary mechanisms governing genetic diversity of pandemic V. cholerae, we compared the genome sequences of 23 V. cholerae strains isolated from a variety of sources over the past 98 years. The genome-based phylogeny revealed 12 distinct V. cholerae lineages, of which one comprises both O1 classical and El Tor biotypes. All seventh pandemic clones share nearly identical gene content. Using analogy to influenza virology, we define the transition from sixth to seventh pandemic strains as a “shift” between pathogenic clones belonging to the same O1 serogroup, but from significantly different phyletic lineages. In contrast, transition among clones during the present pandemic period is characterized as a “drift” between clones, differentiated mainly by varying composition of laterally transferred genomic islands, resulting in emergence of variants, exemplified by V. cholerae O139 and V. cholerae O1 El Tor hybrid clones. Based on the comparative genomics it is concluded that V. cholerae undergoes extensive genetic recombination via lateral gene transfer, and, therefore, genome assortment, not serogroup, should be used to define pathogenic V. cholerae clones. VL - 106 SN - 0027-8424, 1091-6490 ER - TY - JOUR T1 - Complete Genome Sequence of Aggregatibacter (Haemophilus) Aphrophilus NJ8700 JF - Journal of BacteriologyJ. Bacteriol.Journal of BacteriologyJ. Bacteriol. Y1 - 2009 A1 - Di Bonaventura, Maria Pia A1 - DeSalle, Rob A1 - M. Pop A1 - Nagarajan, Niranjan A1 - Figurski, David H. A1 - Fine, Daniel H. A1 - Kaplan, Jeffrey B. A1 - Planet, Paul J. AB - We report the finished and annotated genome sequence of Aggregatibacter aphrophilus strain NJ8700, a strain isolated from the oral flora of a healthy individual, and discuss characteristics that may affect its dual roles in human health and disease. This strain has a rough appearance, and its genome contains genes encoding a type VI secretion system and several factors that may participate in host colonization. VL - 191 SN - 0021-9193, 1098-5530 ER - TY - Generic T1 - A cooperative combinatorial Particle Swarm Optimization algorithm for side-chain packing T2 - IEEE Swarm Intelligence Symposium, 2009. SIS '09 Y1 - 2009 A1 - Lapizco-Encinas, G. A1 - Kingsford, Carl A1 - Reggia, James A. KW - Algorithm design and analysis KW - Amino acids KW - combinatorial mathematics KW - cooperative combinatorial particle swarm optimization algorithm KW - Design optimization KW - Encoding KW - Feedback KW - numerical optimization KW - Optimization methods KW - particle swarm optimisation KW - Particle swarm optimization KW - Partitioning algorithms KW - Proteins KW - proteomics KW - proteomics optimization KW - Robustness KW - side-chain packing AB - Particle Swarm Optimization (PSO) is a well-known, competitive technique for numerical optimization with real-parameter representation. This paper introduces CCPSO, a new Cooperative Particle Swarm Optimization algorithm for combinatorial problems. The cooperative strategy is achieved by splitting the candidate solution vector into components, where each component is optimized by a particle. Particles move throughout a continuous space, their movements based on the influences exerted by static particles that then get feedback based on the fitness of the candidate solution. Here, the application of this technique to side-chain packing (a proteomics optimization problem) is investigated. To verify the efficiency of the proposed CCPSO algorithm, we test our algorithm on three side-chain packing problems and compare our results with the provably optimal result. Computational results show that the proposed algorithm is very competitive, obtaining a conformation with an energy value within 1% of the provably optimal solution in many proteins. JA - IEEE Swarm Intelligence Symposium, 2009. SIS '09 PB - IEEE SN - 978-1-4244-2762-8 ER - TY - JOUR T1 - CTCF binding site classes exhibit distinct evolutionary, genomic, epigenomic and transcriptomic features JF - Genome BiologyGenome Biology Y1 - 2009 A1 - Essien, Kobby A1 - Vigneau, Sebastien A1 - Apreleva, Sofia A1 - Singh, Larry N. A1 - Bartolomei, Marisa S. A1 - Sridhar Hannenhalli AB - CTCF (CCCTC-binding factor) is an evolutionarily conserved zinc finger protein involved in diverse functions ranging from negative regulation of MYC, to chromatin insulation of the beta-globin gene cluster, to imprinting of the Igf2 locus. The 11 zinc fingers of CTCF are known to differentially contribute to the CTCF-DNA interaction at different binding sites. It is possible that the differences in CTCF-DNA conformation at different binding sites underlie CTCF's functional diversity. If so, the CTCF binding sites may belong to distinct classes, each compatible with a specific functional role. VL - 10 SN - 1465-6906 ER - TY - JOUR T1 - The Complete Genome Sequence of Thermococcus Onnurineus NA1 Reveals a Mixed Heterotrophic and Carboxydotrophic Metabolism JF - Journal of BacteriologyJ. Bacteriol.Journal of BacteriologyJ. Bacteriol. Y1 - 2008 A1 - Lee, Hyun Sook A1 - Kang, Sung Gyun A1 - Bae, Seung Seob A1 - Lim, Jae Kyu A1 - Cho, Yona A1 - Kim, Yun Jae A1 - Jeon, Jeong Ho A1 - Cha, Sun-Shin A1 - Kwon, Kae Kyoung A1 - Kim, Hyung-Tae A1 - Park, Cheol-Joo A1 - Lee, Hee-Wook A1 - Kim, Seung Il A1 - Jongsik, Chun A1 - Rita R. Colwell A1 - Kim, Sang-Jin A1 - Lee, Jung-Hyun AB - Members of the genus Thermococcus, sulfur-reducing hyperthermophilic archaea, are ubiquitously present in various deep-sea hydrothermal vent systems and are considered to play a significant role in the microbial consortia. We present the complete genome sequence and feature analysis of Thermococcus onnurineus NA1 isolated from a deep-sea hydrothermal vent area, which reveal clues to its physiology. Based on results of genomic analysis, T. onnurineus NA1 possesses the metabolic pathways for organotrophic growth on peptides, amino acids, or sugars. More interesting was the discovery that the genome encoded unique proteins that are involved in carboxydotrophy to generate energy by oxidation of CO to CO2, thereby providing a mechanistic basis for growth with CO as a substrate. This lithotrophic feature in combination with carbon fixation via RuBisCO (ribulose 1,5-bisphosphate carboxylase/oxygenase) introduces a new strategy with a complementing energy supply for T. onnurineus NA1 potentially allowing it to cope with nutrient stress in the surrounding of hydrothermal vents, providing the first genomic evidence for the carboxydotrophy in Thermococcus. VL - 190 SN - 0021-9193, 1098-5530 ER - TY - JOUR T1 - Computational Analysis of Constraints on Noncoding Regions, Coding Regions and Gene Expression in Relation to Plasmodium Phenotypic Diversity JF - PLoS ONEPLoS ONEPLoS ONEPLoS ONE Y1 - 2008 A1 - Essien, Kobby A1 - Sridhar Hannenhalli A1 - Stoeckert, Christian J. AB - Malaria-causing Plasmodium species exhibit marked differences including host choice and preference for invading particular cell types. The genetic bases of phenotypic differences between parasites can be understood, in part, by investigating constraints on gene expression and genic sequences, both coding and regulatory.We investigated the evolutionary constraints on sequence and expression of parasitic genes by applying comparative genomics approaches to 6 Plasmodium genomes and 2 genome-wide expression studies. We found that the coding regions of Plasmodium transcription factor and sexual development genes are relatively less constrained, as are those of genes encoding CCCH zinc fingers and invasion proteins, which all play important roles in these parasites. Transcription factors and genes with stage-restricted expression have conserved upstream regions and so do several gene classes critical to the parasite's lifestyle, namely, ion transport, invasion, chromatin assembly and CCCH zinc fingers. Additionally, a cross-species comparison of expression patterns revealed that Plasmodium-specific genes exhibit significant expression divergence. Overall, constraints on Plasmodium's protein coding regions confirm observations from other eukaryotes in that transcription factors are under relatively lower constraint. Proteins relevant to the parasite's unique lifestyle also have lower constraint on their coding regions. Greater conservation between Plasmodium species in terms of promoter motifs suggests tight regulatory control of lifestyle genes. However, an interspecies divergence in expression patterns of these genes suggests that either expression is controlled via genomic or epigenomic features not encoded in the proximal promoter sequence, or alternatively, the combinatorial interactions between motifs confer species-specific expression patterns. VL - 3 ER - TY - JOUR T1 - Covariability of Vibrio Cholerae Microdiversity and Environmental Parameters JF - Applied and Environmental MicrobiologyAppl. Environ. Microbiol.Applied and Environmental MicrobiologyAppl. Environ. Microbiol. Y1 - 2008 A1 - Zo, Young-Gun A1 - Chokesajjawatee, Nipa A1 - Arakawa, Eiji A1 - Watanabe, Haruo A1 - Huq, Anwar A1 - Rita R. Colwell AB - Fine-scale diversity of natural bacterial assemblages has been attributed to neutral radiation because correspondence between bacterial phylogenetic signals in the natural environment and environmental parameters had not been detected. Evidence that such correspondence occurs is provided for Vibrio cholerae, establishing a critical role for environmental parameters in bacterial diversity. VL - 74 SN - 0099-2240, 1098-5336 ER - TY - JOUR T1 - Characterization of Ehp, a Secreted Complement Inhibitory Protein from Staphylococcus aureus JF - Journal of Biological ChemistryJournal of Biological Chemistry Y1 - 2007 A1 - Hammel, Michal A1 - Sfyroera, Georgia A1 - Pyrpassopoulos, Serapion A1 - Ricklin, Daniel A1 - Ramyar, Kasra X. A1 - M. Pop A1 - Jin, Zhongmin A1 - Lambris, John D. A1 - Geisbrecht, Brian V. AB - We report here the discovery and characterization of Ehp, a new secreted Staphylococcus aureus protein that potently inhibits the alternative complement activation pathway. Ehp was identified through a genomic scan as an uncharacterized secreted protein from S. aureus, and immunoblotting of conditioned S. aureus culture medium revealed that the Ehp protein was secreted at the highest levels during log-phase bacterial growth. The mature Ehp polypeptide is composed of 80 residues and is 44% identical to the complement inhibitory domain of S. aureus Efb (extracellular fibrinogen-binding protein). We observed preferential binding by Ehp to native and hydrolyzed C3 relative to fully active C3b and found that Ehp formed a subnanomolar affinity complex with these various forms of C3 by binding to its thioester-containing C3d domain. Site-directed mutagenesis demonstrated that Arg75 and Asn82 are important in forming the Ehp·C3d complex, but loss of these side chains did not completely disrupt Ehp/C3d binding. This suggested the presence of a second C3d-binding site in Ehp, which was mapped to the proximity of Ehp Asn63. Further molecular level details of the Ehp/C3d interaction were revealed by solving the 2.7-Å crystal structure of an Ehp·C3d complex in which the low affinity site had been mutationally inactivated. Ehp potently inhibited C3b deposition onto sensitized surfaces by the alternative complement activation pathway. This inhibition was directly related to Ehp/C3d binding and was more potent than that seen for Efb-C. An altered conformation in Ehp-bound C3 was detected by monoclonal antibody C3-9, which is specific for a neoantigen exposed in activated forms of C3. Our results suggest that increased inhibitory potency of Ehp relative to Efb-C is derived from the second C3-binding site in this new protein. VL - 282 ER - TY - JOUR T1 - Cofactor-independent phosphoglycerate mutase is an essential gene in procyclic form Trypanosoma brucei JF - Parasitology researchParasitology research Y1 - 2007 A1 - Djikeng, A. A1 - Raverdy, S. A1 - Foster, Jeffrey S. A1 - Bartholomeu, D. A1 - Zhang, Y. A1 - Najib M. El‐Sayed A1 - Carlow, C. VL - 100 ER - TY - JOUR T1 - COMPUTATIONAL BIOLOGY JF - Nucleic acids researchNucleic Acids Research Y1 - 2007 A1 - Leparc, G. G. A1 - Mitra, R. D. A1 - Vardhanabhuti, S. A1 - Wang, J. A1 - Sridhar Hannenhalli A1 - Smit, S. A1 - Widmann, J. A1 - Knight, R. A1 - Wu, S. A1 - Zhang, Y. A1 - others, PB - Information Retrieval Ltd VL - 35 ER - TY - JOUR T1 - A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana. JF - BMC Bioinformatics Y1 - 2007 A1 - Pertea, Mihaela A1 - Mount, Stephen M A1 - Salzberg, Steven L KW - Alternative Splicing KW - Arabidopsis KW - Computational Biology KW - Enhancer Elements, Genetic KW - Exons KW - Genes, Plant KW - RNA, Plant AB -

BACKGROUND: Algorithmic approaches to splice site prediction have relied mainly on the consensus patterns found at the boundaries between protein coding and non-coding regions. However exonic splicing enhancers have been shown to enhance the utilization of nearby splice sites.

RESULTS: We have developed a new computational technique to identify significantly conserved motifs involved in splice site regulation. First, 84 putative exonic splicing enhancer hexamers are identified in Arabidopsis thaliana. Then a Gibbs sampling program called ELPH was used to locate conserved motifs represented by these hexamers in exonic regions near splice sites in confirmed genes. Oligomers containing 35 of these motifs have been shown experimentally to induce significant inclusion of A. thaliana exons. Second, integration of our regulatory motifs into two different splice site recognition programs significantly improved the ability of the software to correctly predict splice sites in a large database of confirmed genes. We have released GeneSplicerESE, the improved splice site recognition code, as open source software.

CONCLUSION: Our results show that the use of the ESE motifs consistently improves splice site prediction accuracy.

VL - 8 M3 - 10.1186/1471-2105-8-159 ER - TY - JOUR T1 - Creating a nationwide wireless detection sensor network for chemical, biological and radiological threats JF - Gentag White PaperGentag White Paper Y1 - 2007 A1 - Rita R. Colwell A1 - Peeters, J. ER - TY - Generic T1 - A compact mathematical programming formulation for DNA motif finding T2 - Combinatorial Pattern Matching Y1 - 2006 A1 - Kingsford, Carl A1 - Zaslavsky, E. A1 - Singh, M. JA - Combinatorial Pattern Matching ER - TY - JOUR T1 - Comparative genomic evidence for a close relationship between the dimorphic prosthecate bacteria Hyphomonas neptunium and Caulobacter crescentus JF - Journal of bacteriologyJournal of bacteriology Y1 - 2006 A1 - Badger, Jonathan H. A1 - Hoover, Timothy R. A1 - Brun, Yves V. A1 - Weiner, Ronald M. A1 - Laub, Michael T. A1 - Alexandre, Gladys A1 - Mrázek, Jan A1 - Ren, Qinghu A1 - Paulsen, Ian T. A1 - Nelson, Karen E. A1 - Khouri, Hoda M. A1 - Radune, Diana A1 - Sosa, Julia A1 - Dodson, Robert J. A1 - Sullivan, Steven A. A1 - Rosovitz, M. J. A1 - Madupu, Ramana A1 - Brinkac, Lauren M. A1 - Durkin, A. Scott A1 - Daugherty, Sean C. A1 - Kothari, Sagar P. A1 - Giglio, Michelle Gwinn A1 - Zhou, Liwei A1 - Haft, Daniel H. A1 - J. Selengut A1 - Davidsen, Tanja M. A1 - Yang, Qi A1 - Zafar, Nikhat A1 - Ward, Naomi L. KW - Alphaproteobacteria KW - Bacterial Outer Membrane Proteins KW - Caulobacter crescentus KW - cell cycle KW - Chemotaxis KW - DNA, Bacterial KW - Flagella KW - Genome, Bacterial KW - Microbial Viability KW - Molecular Sequence Data KW - Movement KW - Sequence Analysis, DNA KW - Sequence Homology KW - signal transduction AB - The dimorphic prosthecate bacteria (DPB) are alpha-proteobacteria that reproduce in an asymmetric manner rather than by binary fission and are of interest as simple models of development. Prior to this work, the only member of this group for which genome sequence was available was the model freshwater organism Caulobacter crescentus. Here we describe the genome sequence of Hyphomonas neptunium, a marine member of the DPB that differs from C. crescentus in that H. neptunium uses its stalk as a reproductive structure. Genome analysis indicates that this organism shares more genes with C. crescentus than it does with Silicibacter pomeroyi (a closer relative according to 16S rRNA phylogeny), that it relies upon a heterotrophic strategy utilizing a wide range of substrates, that its cell cycle is likely to be regulated in a similar manner to that of C. crescentus, and that the outer membrane complements of H. neptunium and C. crescentus are remarkably similar. H. neptunium swarmer cells are highly motile via a single polar flagellum. With the exception of cheY and cheR, genes required for chemotaxis were absent in the H. neptunium genome. Consistent with this observation, H. neptunium swarmer cells did not respond to any chemotactic stimuli that were tested, which suggests that H. neptunium motility is a random dispersal mechanism for swarmer cells rather than a stimulus-controlled navigation system for locating specific environments. In addition to providing insights into bacterial development, the H. neptunium genome will provide an important resource for the study of other interesting biological processes including chromosome segregation, polar growth, and cell aging. VL - 188 N1 - http://www.ncbi.nlm.nih.gov/pubmed/16980487?dopt=Abstract ER - TY - JOUR T1 - Comparative genomics of emerging human ehrlichiosis agents JF - PLoS geneticsPLoS genetics Y1 - 2006 A1 - Dunning Hotopp, Julie C. A1 - Lin, Mingqun A1 - Madupu, Ramana A1 - Crabtree, Jonathan A1 - Angiuoli, Samuel V. A1 - Eisen, Jonathan A. A1 - Eisen, Jonathan A1 - Seshadri, Rekha A1 - Ren, Qinghu A1 - Wu, Martin A1 - Utterback, Teresa R. A1 - Smith, Shannon A1 - Lewis, Matthew A1 - Khouri, Hoda A1 - Zhang, Chunbin A1 - Niu, Hua A1 - Lin, Quan A1 - Ohashi, Norio A1 - Zhi, Ning A1 - Nelson, William A1 - Brinkac, Lauren M. A1 - Dodson, Robert J. A1 - Rosovitz, M. J. A1 - Sundaram, Jaideep A1 - Daugherty, Sean C. A1 - Davidsen, Tanja A1 - Durkin, Anthony S. A1 - Gwinn, Michelle A1 - Haft, Daniel H. A1 - J. Selengut A1 - Sullivan, Steven A. A1 - Zafar, Nikhat A1 - Zhou, Liwei A1 - Benahmed, Faiza A1 - Forberger, Heather A1 - Halpin, Rebecca A1 - Mulligan, Stephanie A1 - Robinson, Jeffrey A1 - White, Owen A1 - Rikihisa, Yasuko A1 - Tettelin, Hervé KW - Animals KW - Biotin KW - DNA Repair KW - Ehrlichia KW - Ehrlichiosis KW - Genome KW - Genomics KW - HUMANS KW - Models, Biological KW - Phylogeny KW - Rickettsia KW - Ticks AB - Anaplasma (formerly Ehrlichia) phagocytophilum, Ehrlichia chaffeensis, and Neorickettsia (formerly Ehrlichia) sennetsu are intracellular vector-borne pathogens that cause human ehrlichiosis, an emerging infectious disease. We present the complete genome sequences of these organisms along with comparisons to other organisms in the Rickettsiales order. Ehrlichia spp. and Anaplasma spp. display a unique large expansion of immunodominant outer membrane proteins facilitating antigenic variation. All Rickettsiales have a diminished ability to synthesize amino acids compared to their closest free-living relatives. Unlike members of the Rickettsiaceae family, these pathogenic Anaplasmataceae are capable of making all major vitamins, cofactors, and nucleotides, which could confer a beneficial role in the invertebrate vector or the vertebrate host. Further analysis identified proteins potentially involved in vacuole confinement of the Anaplasmataceae, a life cycle involving a hematophagous vector, vertebrate pathogenesis, human pathogenesis, and lack of transovarial transmission. These discoveries provide significant insights into the biology of these obligate intracellular pathogens. VL - 2 N1 - http://www.ncbi.nlm.nih.gov/pubmed/16482227?dopt=Abstract ER - TY - JOUR T1 - Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. JF - BMC Genomics Y1 - 2006 A1 - Campbell, Matthew A A1 - Haas, Brian J A1 - Hamilton, John P A1 - Mount, Stephen M A1 - Buell, C Robin KW - Alternative Splicing KW - Arabidopsis KW - DNA, Complementary KW - Expressed Sequence Tags KW - Oryza AB -

BACKGROUND: Recently, genomic sequencing efforts were finished for Oryza sativa (cultivated rice) and Arabidopsis thaliana (Arabidopsis). Additionally, these two plant species have extensive cDNA and expressed sequence tag (EST) libraries. We employed the Program to Assemble Spliced Alignments (PASA) to identify and analyze alternatively spliced isoforms in both species.

RESULTS: A comprehensive analysis of alternative splicing was performed in rice that started with >1.1 million publicly available spliced ESTs and over 30,000 full length cDNAs in conjunction with the newly enhanced PASA software. A parallel analysis was performed with Arabidopsis to compare and ascertain potential differences between monocots and dicots. Alternative splicing is a widespread phenomenon (observed in greater than 30% of the loci with transcript support) and we have described nine alternative splicing variations. While alternative splicing has the potential to create many RNA isoforms from a single locus, the majority of loci generate only two or three isoforms and transcript support indicates that these isoforms are generally not rare events. For the alternate donor (AD) and acceptor (AA) classes, the distance between the splice sites for the majority of events was found to be less than 50 basepairs (bp). In both species, the most frequent distance between AA is 3 bp, consistent with reports in mammalian systems. Conversely, the most frequent distance between AD is 4 bp in both plant species, as previously observed in mouse. Most alternative splicing variations are localized to the protein coding sequence and are predicted to significantly alter the coding sequence.

CONCLUSION: Alternative splicing is widespread in both rice and Arabidopsis and these species share many common features. Interestingly, alternative splicing may play a role beyond creating novel combinations of transcripts that expand the proteome. Many isoforms will presumably have negative consequences for protein structure and function, suggesting that their biological role involves post-transcriptional regulation of gene expression.

VL - 7 M3 - 10.1186/1471-2164-7-327 ER - TY - CHAP T1 - Conservation Patterns in cis-Elements Reveal Compensatory Mutations T2 - Comparative GenomicsComparative Genomics Y1 - 2006 A1 - Evans, Perry A1 - Donahue, Greg A1 - Sridhar Hannenhalli ED - Bourque, Guillaume ED - El-Mabrouk, Nadia AB - Transcriptional regulation critically depends on proper interactions between transcription factors (TF) and their cognate DNA binding sites or cis elements. A better understanding and modelling of the TF-DNA interaction is an important area of research. The Positional Weight Matrix (PWM) is the most common model of TF-DNA binding and it presumes that the nucleotide preferences at individual positions within the binding site are independent. However, studies have shown that this independence assumption does not always hold. If the nucleotide preference at one position depends on the nucleotide at another position, a chance mutation at one position should exert selection pressures at the other position. By comparing the patterns of evolutionary conservation at individual positions within cis elements, here we show that positional dependence within binding sites is highly prevalent. We also show that dependent positions are more likely to be functional, as evidenced by a higher information content and higher conservation. We discuss two examples—Elk-1 and SAP-1 where the inferred compensatory mutation is consistent with known TF-DNA crystal structure. JA - Comparative GenomicsComparative Genomics T3 - Lecture Notes in Computer Science PB - Springer Berlin / Heidelberg VL - 4205 SN - 978-3-540-44529-6 ER - TY - JOUR T1 - Cholera: the killer from the deep JF - The BiochemistThe Biochemist Y1 - 2005 A1 - Rita R. Colwell AB - The current international attention to the importance ofcombating infectious diseases can provide the opportunity for a multidisciplinary approach that joins medicine with many other scientific and technological disciplines. Science and technology are major forces that have the potential to balance the world’s inequities. The connection between cholera and the environment provides a paradigm for this perspective. ER - TY - JOUR T1 - Comparative Genomics of Trypanosomatid Parasitic Protozoa JF - ScienceScience Y1 - 2005 A1 - Najib M. El‐Sayed A1 - Myler, Peter J. A1 - Blandin, Gaëlle A1 - Berriman, Matthew A1 - Crabtree, Jonathan A1 - Aggarwal, Gautam A1 - Caler, Elisabet A1 - Renauld, Hubert A1 - Worthey, Elizabeth A. A1 - Hertz-Fowler, Christiane A1 - Ghedin, Elodie A1 - Peacock, Christopher A1 - Bartholomeu, Daniella C. A1 - Haas, Brian J. A1 - Tran, Anh-Nhi A1 - Wortman, Jennifer R. A1 - Alsmark, U. Cecilia M. A1 - Angiuoli, Samuel A1 - Anupama, Atashi A1 - Badger, Jonathan A1 - Bringaud, Frederic A1 - Cadag, Eithon A1 - Carlton, Jane M. A1 - Cerqueira, Gustavo C. A1 - Creasy, Todd A1 - Delcher, Arthur L. A1 - Djikeng, Appolinaire A1 - Embley, T. Martin A1 - Hauser, Christopher A1 - Ivens, Alasdair C. A1 - Kummerfeld, Sarah K. A1 - Pereira-Leal, Jose B. A1 - Nilsson, Daniel A1 - Peterson, Jeremy A1 - Salzberg, Steven L. A1 - Shallom, Joshua A1 - Silva, Joana C. A1 - Sundaram, Jaideep A1 - Westenberger, Scott A1 - White, Owen A1 - Melville, Sara E. A1 - Donelson, John E. A1 - Andersson, Björn A1 - Stuart, Kenneth D. A1 - Hall, Neil AB - A comparison of gene content and genome architecture of Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major, three related pathogens with different life cycles and disease pathology, revealed a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters. Many species-specific genes, especially large surface antigen families, occur at nonsyntenic chromosome-internal and subtelomeric regions. Retroelements, structural RNAs, and gene family expansion are often associated with syntenic discontinuities that—along with gene divergence, acquisition and loss, and rearrangement within the syntenic regions—have shaped the genomes of each parasite. Contrary to recent reports, our analyses reveal no evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont. VL - 309 ER - TY - JOUR T1 - Critical Factors Influencing the Occurrence of Vibrio Cholerae in the Environment of Bangladesh JF - Applied and Environmental MicrobiologyAppl. Environ. Microbiol.Applied and Environmental MicrobiologyAppl. Environ. Microbiol. Y1 - 2005 A1 - Huq, Anwar A1 - Sack, R. Bradley A1 - Nizam, Azhar A1 - Longini, Ira M. A1 - Nair, G. Balakrish A1 - Ali, Afsar A1 - Morris, J. Glenn A1 - Khan, M. N. Huda A1 - Siddique, A. Kasem A1 - Yunus, Mohammed A1 - Albert, M. John A1 - Sack, David A. A1 - Rita R. Colwell AB - The occurrence of outbreaks of cholera in Africa in 1970 and in Latin America in 1991, mainly in coastal communities, and the appearance of the new serotype Vibrio cholerae O139 in India and subsequently in Bangladesh have stimulated efforts to understand environmental factors influencing the growth and geographic distribution of epidemic Vibrio cholerae serotypes. Because of the severity of recent epidemics, cholera is now being considered by some infectious disease investigators as a “reemerging” disease, prompting new work on the ecology of vibrios. Epidemiological and ecological surveillance for cholera has been under way in four rural, geographically separated locations in Bangladesh for the past 4 years, during which both clinical and environmental samples were collected at biweekly intervals. The clinical epidemiology portion of the research has been published (Sack et al., J. Infect. Dis. 187:96-101, 2003). The results of environmental sampling and analysis of the environmental and clinical data have revealed significant correlations of water temperature, water depth, rainfall, conductivity, and copepod counts with the occurrence of cholera toxin-producing bacteria (presumably V. cholerae). The lag periods between increases or decreases in units of factors, such as temperature and salinity, and occurrence of cholera correlate with biological parameters, e.g., plankton population blooms. The new information on the ecology of V. cholerae is proving useful in developing environmental models for the prediction of cholera epidemics. VL - 71 SN - 0099-2240, 1098-5336 ER - TY - JOUR T1 - CHARACTERIZATION OF< i> Ath17, A QUANTITATIVE TRAIT LOCUS FOR ATHEROSCLEROSIS SUSCEPTIBILITY BETWEEN C57BL/6J AND 129S1/SvImJ; SINGLE-NUCLEOTIDE POLYMORPHISMS HAVE IMPORTANT IMPLICATIONS ON IDENTIFYING ATHEROSCLEROSIS MODIFIER GENES JF - Cardiovascular PathologyCardiovascular Pathology Y1 - 2004 A1 - Ishimori, N. A1 - Walsh, K. A1 - Zheng, X. A1 - Lu, F. A1 - Sridhar Hannenhalli A1 - Nusskern, D. A1 - Mural, R. A1 - Paigen, B. PB - Elsevier VL - 13 ER - TY - JOUR T1 - Comparative Genome Assembly JF - Briefings in BioinformaticsBrief BioinformBriefings in BioinformaticsBrief Bioinform Y1 - 2004 A1 - M. Pop A1 - Phillippy, Adam A1 - Delcher, Arthur L. A1 - Salzberg, Steven L. KW - Assembly KW - comparative genomics KW - open source KW - shotgun sequencing AB - One of the most complex and computationally intensive tasks of genome sequence analysis is genome assembly. Even today, few centres have the resources, in both software and hardware, to assemble a genome from the thousands or millions of individual sequences generated in a whole-genome shotgun sequencing project. With the rapid growth in the number of sequenced genomes has come an increase in the number of organisms for which two or more closely related species have been sequenced. This has created the possibility of building a comparative genome assembly algorithm, which can assemble a newly sequenced genome by mapping it onto a reference genome.We describe here a novel algorithm for comparative genome assembly that can accurately assemble a typical bacterial genome in less than four minutes on a standard desktop computer. The software is available as part of the open-source AMOS project. VL - 5 SN - 1467-5463, 1477-4054 ER - TY - JOUR T1 - Comparison of the genome of the oral pathogen Treponema denticola with other spirochete genomes JF - Proceedings of the National Academy of Sciences of the United States of AmericaProceedings of the National Academy of Sciences of the United States of America Y1 - 2004 A1 - Seshadri, Rekha A1 - Myers, Garry S. A. A1 - Tettelin, Hervé A1 - Eisen, Jonathan A. A1 - Heidelberg, John F. A1 - Dodson, Robert J. A1 - Davidsen, Tanja M. A1 - DeBoy, Robert T. A1 - Fouts, Derrick E. A1 - Haft, Dan H. A1 - J. Selengut A1 - Ren, Qinghu A1 - Brinkac, Lauren M. A1 - Madupu, Ramana A1 - Kolonay, Jamie A1 - Durkin, A. Scott A1 - Daugherty, Sean C. A1 - Shetty, Jyoti A1 - Shvartsbeyn, Alla A1 - Gebregeorgis, Elizabeth A1 - Geer, Keita A1 - Tsegaye, Getahun A1 - Malek, Joel A1 - Ayodeji, Bola A1 - Shatsman, Sofiya A1 - McLeod, Michael P. A1 - Smajs, David A1 - Howell, Jerrilyn K. A1 - Pal, Sangita A1 - Amin, Anita A1 - Vashisth, Pankaj A1 - McNeill, Thomas Z. A1 - Xiang, Qin A1 - Sodergren, Erica A1 - Baca, Ernesto A1 - Weinstock, George M. A1 - Norris, Steven J. A1 - Fraser, Claire M. A1 - Paulsen, Ian T. KW - ATP-Binding Cassette Transporters KW - Bacterial Proteins KW - Base Sequence KW - Borrelia burgdorferi KW - Genes, Bacterial KW - Genome, Bacterial KW - Leptospira interrogans KW - Models, Genetic KW - Molecular Sequence Data KW - Mouth KW - Sequence Homology, Amino Acid KW - Treponema KW - Treponema pallidum AB - We present the complete 2,843,201-bp genome sequence of Treponema denticola (ATCC 35405) an oral spirochete associated with periodontal disease. Analysis of the T. denticola genome reveals factors mediating coaggregation, cell signaling, stress protection, and other competitive and cooperative measures, consistent with its pathogenic nature and lifestyle within the mixed-species environment of subgingival dental plaque. Comparisons with previously sequenced spirochete genomes revealed specific factors contributing to differences and similarities in spirochete physiology as well as pathogenic potential. The T. denticola genome is considerably larger in size than the genome of the related syphilis-causing spirochete Treponema pallidum. The differences in gene content appear to be attributable to a combination of three phenomena: genome reduction, lineage-specific expansions, and horizontal gene transfer. Genes lost due to reductive evolution appear to be largely involved in metabolism and transport, whereas some of the genes that have arisen due to lineage-specific expansions are implicated in various pathogenic interactions, and genes acquired via horizontal gene transfer are largely phage-related or of unknown function. VL - 101 N1 - http://www.ncbi.nlm.nih.gov/pubmed/15064399?dopt=Abstract ER - TY - JOUR T1 - Characterization of a Vibrio cholerae phage isolated from the coastal water of Peru JF - Environmental MicrobiologyEnvironmental Microbiology Y1 - 2003 A1 - Talledo, Miguel A1 - Rivera, Irma N. G. A1 - Lipp, Erin K. A1 - Neale, Angela A1 - Karaolis, David A1 - Huq, Anwar A1 - Rita R. Colwell AB - A Vibrio cholerae bacteriophage, family Myoviridae, was isolated from seawater collected from the coastal water of Lima, Peru. Genome size was estimated to be 29 kbp. The temperate phage was specific to V. cholerae and infected 12/13 V. cholerae O1 strains and half of the four non-O1/non-O139 strains tested in this study. Vibrio cholerae O139 strains were resistant to infection and highest infection rates were obtained in low nutrient media amended with NaCl or prepared using seawater as diluent. VL - 5 SN - 1462-2920 ER - TY - JOUR T1 - Comparing bootstrap and posterior probability values in the four-taxon case JF - Syst BiolSyst Biol Y1 - 2003 A1 - Michael P. Cummings A1 - Handley, S. A. A1 - Myers, D. S. A1 - Reed, D. L. A1 - Rokas, A. A1 - Winka, K. AB - Assessment of the reliability of a given phylogenetic hypothesis is an important step in phylogenetic analysis. Historically, the nonparametric bootstrap procedure has been the most frequently used method for assessing the support for specific phylogenetic relationships. The recent employment of Bayesian methods for phylogenetic inference problems has resulted in clade support being expressed in terms of posterior probabilities. We used simulated data and the four-taxon case to explore the relationship between nonparametric bootstrap values (as inferred by maximum likelihood) and posterior probabilities (as inferred by Bayesian analysis). The results suggest a complex association between the two measures. Three general regions of tree space can be identified: (1) the neutral zone, where differences between mean bootstrap and mean posterior probability values are not significant, (2) near the two-branch corner, and (3) deep in the two-branch corner. In the last two regions, significant differences occur between mean bootstrap and mean posterior probability values. Whether bootstrap or posterior probability values are higher depends on the data in support of alternative topologies. Examination of star topologies revealed that both bootstrap and posterior probability values differ significantly from theoretical expectations; in particular, there are more posterior probability values in the range 0.85-1 than expected by theory. Therefore, our results corroborate the findings of others that posterior probability values are excessively high. Our results also suggest that extrapolations from single topology branch-length studies are unlikely to provide any general conclusions regarding the relationship between bootstrap and posterior probability values. VL - 52 ER - TY - JOUR T1 - Complete genome sequence and comparative analysis of the metabolically versatile Pseudomonas putida KT2440 JF - Environmental MicrobiologyEnvironmental Microbiology Y1 - 2003 A1 - Nelson, K. E. A1 - Weinel, C. A1 - Paulsen, I. T. A1 - Dodson, R. J. A1 - Hilbert, H. A1 - Martins dos Santos, V. A. P. A1 - Fouts, D. E. A1 - Gill, S. R. A1 - M. Pop A1 - Holmes, M. A1 - others, VL - 5 ER - TY - JOUR T1 - The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC3000 JF - Proceedings of the National Academy of Sciences of the United States of AmericaProceedings of the National Academy of Sciences of the United States of America Y1 - 2003 A1 - Buell, C. Robin A1 - Joardar, Vinita A1 - Lindeberg, Magdalen A1 - J. Selengut A1 - Paulsen, Ian T. A1 - Gwinn, Michelle L. A1 - Dodson, Robert J. A1 - DeBoy, Robert T. A1 - Durkin, A. Scott A1 - Kolonay, James F. A1 - Madupu, Ramana A1 - Daugherty, Sean A1 - Brinkac, Lauren A1 - Beanan, Maureen J. A1 - Haft, Daniel H. A1 - Nelson, William C. A1 - Davidsen, Tanja A1 - Zafar, Nikhat A1 - Zhou, Liwei A1 - Liu, Jia A1 - Yuan, Qiaoping A1 - Khouri, Hoda A1 - Fedorova, Nadia A1 - Tran, Bao A1 - Russell, Daniel A1 - Berry, Kristi A1 - Utterback, Teresa A1 - Aken, Susan E. van A1 - Feldblyum, Tamara V. A1 - D'Ascenzo, Mark A1 - Deng, Wen-Ling A1 - Ramos, Adela R. A1 - Alfano, James R. A1 - Cartinhour, Samuel A1 - Chatterjee, Arun K. A1 - Delaney, Terrence P. A1 - Lazarowitz, Sondra G. A1 - Martin, Gregory B. A1 - Schneider, David J. A1 - Tang, Xiaoyan A1 - Bender, Carol L. A1 - White, Owen A1 - Fraser, Claire M. A1 - Collmer, Alan KW - Arabidopsis KW - Base Sequence KW - Biological Transport KW - Genome, Bacterial KW - Lycopersicon esculentum KW - Molecular Sequence Data KW - Plant Growth Regulators KW - Plasmids KW - Pseudomonas KW - Reactive Oxygen Species KW - Siderophores KW - virulence AB - We report the complete genome sequence of the model bacterial pathogen Pseudomonas syringae pathovar tomato DC3000 (DC3000), which is pathogenic on tomato and Arabidopsis thaliana. The DC3000 genome (6.5 megabases) contains a circular chromosome and two plasmids, which collectively encode 5,763 ORFs. We identified 298 established and putative virulence genes, including several clusters of genes encoding 31 confirmed and 19 predicted type III secretion system effector proteins. Many of the virulence genes were members of paralogous families and also were proximal to mobile elements, which collectively comprise 7% of the DC3000 genome. The bacterium possesses a large repertoire of transporters for the acquisition of nutrients, particularly sugars, as well as genes implicated in attachment to plant surfaces. Over 12% of the genes are dedicated to regulation, which may reflect the need for rapid adaptation to the diverse environments encountered during epiphytic growth and pathogenesis. Comparative analyses confirmed a high degree of similarity with two sequenced pseudomonads, Pseudomonas putida and Pseudomonas aeruginosa, yet revealed 1,159 genes unique to DC3000, of which 811 lack a known function. VL - 100 N1 - http://www.ncbi.nlm.nih.gov/pubmed/12928499?dopt=Abstract ER - TY - JOUR T1 - Characterization of Pseudoalteromonas citrea and P. nigrifaciens Isolated from Different Ecological Habitats Based on REP-PCR Genomic Fingerprints JF - Systematic and Applied MicrobiologySystematic and Applied Microbiology Y1 - 2002 A1 - Ivanova, Elena P. A1 - Matte, Glavur R. A1 - Matte, Maria H. A1 - Coenye, Tom A1 - Huq, Anwarul A1 - Rita R. Colwell KW - biogeography KW - BOX-PCR KW - ERIC KW - Pseudoalteromonas KW - REP AB - SummaryDNA primers corresponding to conserved repetitive interspersed genomic motifs and PCR were used to show that REP, ERIC and BOX-like DNA sequences are present in marine, oxidative, Gram-negative Pseudoalteromonas strains. REP, ERIC and BOX-PCR were used for rapid molecular characterization of both the type species of the genus and environmental strains isolated from samples collected in different geographical areas. PCR-generated genomic fingerprint patterns were found to be both complex and strain specific. Analysis of the genotypic structure of phenotypically diverse P. citrea revealed a geographic clustering of Far Eastern brown-pigmented, agar-digesting strains of this species. Marine isolates of P. nigrifaciens with 67–70% DNA relatedness generated genomic patterns different from those of the type strain and formed a separate cluster. It is concluded that REP, ERIC and BOX-PCR are effective in generating strain specific patterns that can be used to elucidate geographic distribution, with these genomic patterns providing a valuable biogeographic criterion. VL - 25 SN - 0723-2020 ER - TY - CHAP T1 - Combinatorial Algorithms for Design of DNA Arrays T2 - Chip TechnologyChip Technology Y1 - 2002 A1 - Sridhar Hannenhalli A1 - Hubbell, Earl A1 - Lipshutz, Robert A1 - Pevzner, Pavel ED - Hoheisel, Jörg ED - Brazma, A. ED - Büssow, K. ED - Cantor, C. ED - Christians, F. ED - Chui, G. ED - Diaz, R. ED - Drmanac, R. ED - Drmanac, S. ED - Eickhoff, H. ED - Fellenberg, K. ED - Sridhar Hannenhalli ED - Hoheisel, J. ED - Hou, A. ED - Hubbell, E. ED - Jin, H. ED - Jin, P. ED - Jurinke, C. ED - Konthur, Z. ED - Köster, H. ED - Kwon, S. ED - Lacy, S. ED - Lehrach, H. ED - Lipshutz, R. ED - Little, D. ED - Lueking, A. ED - McGall, G. ED - Moeur, B. ED - Nordhoff, E. ED - Nyarsik, L. ED - Pevzner, P. ED - Robinson, A. ED - Sarkans, U. ED - Shafto, J. ED - Sohail, M. ED - Southern, E. ED - Swanson, D. ED - Ukrainczyk, T. ED - van den Boom, D. ED - Vilo, J. ED - Vingron, M. ED - Walter, G. ED - Xu, C. AB - Optimal design of DNA arrays requires the development of algorithms with two-fold goals: reducing the effects caused by unintended illumination ( border length minimization problem ) and reducing the complexity of masks ( mask decomposition problem ). We describe algorithms that reduce the number of rectangles in mask decomposition by 20–30% as compared to a standard array design under the assumption that the arrangement of oligonucleotides on the array is fixed. This algorithm produces provably optimal solution for all studied real instances of array design. We also address the difficult problem of finding an arrangement which minimizes the border length and come up with a new idea of threading that significantly reduces the border length as compared to standard designs. JA - Chip TechnologyChip Technology T3 - Advances in Biochemical Engineering/Biotechnology PB - Springer Berlin / Heidelberg VL - 77 SN - 978-3-540-43215-9 ER - TY - JOUR T1 - Comparative Genome Sequencing for Discovery of Novel Polymorphisms in Bacillus Anthracis JF - ScienceScienceScienceScience Y1 - 2002 A1 - Read, Timothy D. A1 - Salzberg, Steven L. A1 - M. Pop A1 - Shumway, Martin A1 - Umayam, Lowell A1 - Jiang, Lingxia A1 - Holtzapple, Erik A1 - Busch, Joseph D. A1 - Smith, Kimothy L. A1 - Schupp, James M. A1 - Solomon, Daniel A1 - Keim, Paul A1 - Fraser, Claire M. AB - Comparison of the whole-genome sequence ofBacillus anthracis isolated from a victim of a recent bioterrorist anthrax attack with a reference reveals 60 new markers that include single nucleotide polymorphisms (SNPs), inserted or deleted sequences, and tandem repeats. Genome comparison detected four high-quality SNPs between the two sequenced B. anthracischromosomes and seven differences among different preparations of the reference genome. These markers have been tested on a collection of anthrax isolates and were found to divide these samples into distinct families. These results demonstrate that genome-based analysis of microbial pathogens will provide a powerful new tool for investigation of infectious disease outbreaks. VL - 296 SN - 0036-8075, 1095-9203 ER - TY - JOUR T1 - Cortical Spreading depression and the pathogenesis of brain disorders: a computational and neural network-based investigation JF - Neurological researchNeurological research Y1 - 2001 A1 - Ruppin, E. A1 - Reggia, James A. VL - 23 ER - TY - JOUR T1 - Carbonic anhydrase III: the phosphatase activity is extrinsic JF - Archives of biochemistry and biophysicsArchives of biochemistry and biophysics Y1 - 2000 A1 - Kim, G. A1 - J. Selengut A1 - Levine, R. L. KW - Animals KW - Carbonic Anhydrases KW - Chromatography, High Pressure Liquid KW - Cloning, Molecular KW - Enzyme Activation KW - Glutathione KW - Kinetics KW - Liver KW - Male KW - Muscles KW - Phosphoric Monoester Hydrolases KW - Precipitin Tests KW - Rabbits KW - Rats KW - Rats, Inbred F344 KW - Recombinant Proteins KW - Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization KW - Time factors AB - The carbonic anhydrases reversibly hydrate carbon dioxide to yield bicarbonate and hydrogen ion. They have a variety of physiological functions, although the specific roles of each of the 10 known isozymes are unclear. Carbonic anhydrase isozyme III is particularly rich in skeletal muscle and adipocytes, and it is unique among the isozymes in also exhibiting phosphatase activity. Previously published studies provided evidence that the phosphatase activity was intrinsic to carbonic anhydrase III, that it had specificity for tyrosine phosphate, and that activity was regulated by reversible glutathionylation of cysteine186. To study the mechanism of this phosphatase, we cloned and expressed the rat liver carbonic anhydrase III. The purified recombinant had the same specific activity as the carbonic anhydrase purified from rat liver, but it had virtually no phosphatase activity. We attempted to identify an activator of the phosphatase in rat liver and found a protein of approximately 14 kDa, the amount of which correlated with the phosphatase activity of the carbonic anhydrase III fractions. It was identified as liver fatty acid binding protein, which was then purified to test for activity as an activator of the phosphatase and for protein-protein interaction, but neither binding nor activation could be demonstrated. Immunoprecipitation experiments established that carbonic anhydrase III could be separated from the phosphatase activity. Finally, adding additional purification steps completely separated the phosphatase activity from the carbonic anhydrase activity. We conclude that the phosphatase activity previously considered to be intrinsic to carbonic anhydrase III is actually extrinsic. Thus, this isozyme exhibits only the carbon dioxide hydratase and esterase activities characteristic of the other mammalian isozymes, and the phosphatase previously shown to be activated by glutathionylation is not carbonic anhydrase III. VL - 377 N1 - http://www.ncbi.nlm.nih.gov/pubmed/10845711?dopt=Abstract ER - TY - JOUR T1 - A Case for Evolutionary Genomics and the Comprehensive Examination of Sequence Biodiversity JF - Molecular Biology and EvolutionMol Biol EvolMolecular Biology and EvolutionMol Biol Evol Y1 - 2000 A1 - Pollock, David D. A1 - Eisen, Jonathan A. A1 - Doggett, Norman A. A1 - Michael P. Cummings AB - Comparative analysis is one of the most powerful methods available for understanding the diverse and complex systems found in biology, but it is often limited by a lack of comprehensive taxonomic sampling. Despite the recent development of powerful genome technologies capable of producing sequence data in large quantities (witness the recently completed first draft of the human genome), there has been relatively little change in how evolutionary studies are conducted. The application of genomic methods to evolutionary biology is a challenge, in part because gene segments from different organisms are manipulated separately, requiring individual purification, cloning, and sequencing. We suggest that a feasible approach to collecting genome-scale data sets for evolutionary biology (i.e., evolutionary genomics) may consist of combination of DNA samples prior to cloning and sequencing, followed by computational reconstruction of the original sequences. This approach will allow the full benefit of automated protocols developed by genome projects to be realized; taxon sampling levels can easily increase to thousands for targeted genomes and genomic regions. Sequence diversity at this level will dramatically improve the quality and accuracy of phylogenetic inference, as well as the accuracy and resolution of comparative evolutionary studies. In particular, it will be possible to make accurate estimates of normal evolution in the context of constant structural and functional constraints (i.e., site-specific substitution probabilities), along with accurate estimates of changes in evolutionary patterns, including pairwise coevolution between sites, adaptive bursts, and changes in selective constraints. These estimates can then be used to understand and predict the effects of protein structure and function on sequence evolution and to predict unknown details of protein structure, function, and functional divergence. In order to demonstrate the practicality of these ideas and the potential benefit for functional genomic analysis, we describe a pilot project we are conducting to simultaneously sequence large numbers of vertebrate mitochondrial genomes. VL - 17 SN - 0737-4038, 1537-1719 ER - TY - JOUR T1 - A computational model of acute focal cortical lesions JF - StrokeStroke Y1 - 1997 A1 - Goodall, S. A1 - Reggia, James A. A1 - Chen, Y. A1 - Ruppin, E. A1 - Whitney, C. VL - 28 ER - TY - JOUR T1 - Computer models: A new approach to the investigation of disease JF - MD ComputingMD Computing Y1 - 1997 A1 - Reggia, James A. A1 - Ruppin, E. A1 - Berndt, R. S. VL - 14 ER - TY - JOUR T1 - Computational studies of synaptic alterations in Alzheimer’s disease JF - Neural modeling of brain and cognitive disordersNeural modeling of brain and cognitive disorders Y1 - 1996 A1 - Ruppin, E. A1 - Horn, D. A1 - Levy, N. A1 - Reggia, James A. ER - TY - JOUR T1 - cDNA expressed sequence tags of Trypanosoma brucei rhodesiense provide new insights into the biology of the parasite JF - Molecular and Biochemical ParasitologyMolecular and Biochemical Parasitology Y1 - 1995 A1 - Najib M. El‐Sayed A1 - Alarcon, Clara M. A1 - Beck, John C. A1 - Sheffield, Val C. A1 - Donelson, John E. KW - cDNA KW - Expressed sequence tag KW - Trypanosoma brucei rhodesiense AB - A total of 518 expressed sequence tags (ESTs) have been generated from clones randomly selected from a cDNA library and a spliced leader sub-library of a Trypanosoma brucei rhodesiense bloodstream clone. 205 (39%) of the clones were identified based on matches to 113 unique genes in the public databases. Of these, 71 cDNAs display significant similarities to genes in unrelated organisms encoding metabolic enzymes, signal transduction proteins, transcription factors, ribosomal proteins, histones, a proliferation-associated protein and thimet oligopeptidase, among others. 313 of the cDNAs are not related to any other sequences in the databases. These cDNA ESTs provide new avenues of research for exploring both the novel trypanosome-specific genes and the genome organization of this parasite, as well as a resource for identifying trypanosome homologs to genes expressed in other organisms. VL - 73 SN - 0166-6851 ER - TY - JOUR T1 - Crystallization and preliminary X-ray investigation of the recombinant Trypanosoma brucei rhodesiense calmodulin JF - Proteins: Structure, Function, and BioinformaticsProteins: Structure, Function, and Bioinformatics Y1 - 1995 A1 - Najib M. El‐Sayed A1 - Patton, C. L. A1 - Harkins, P. C. A1 - Fox, R. O. A1 - Anderson, K. VL - 21 ER - TY - JOUR T1 - copia-like retrotransposons are ubiquitous among plants JF - Proc Natl Acad Sci USAProc Natl Acad Sci USA Y1 - 1992 A1 - Voytas, D. F. A1 - Michael P. Cummings A1 - Koniczny, A. A1 - Ausubel, F. M. A1 - Rodermel, S. R. AB - Transposable genetic elements are assumed to be a feature of all eukaryotic genomes. Their identification, however, has largely been haphazard, limited principally to organisms subjected to molecular or genetic scrutiny. We assessed the phylogenetic distribution of copia-like retrotransposons, a class of transposable element that proliferates by reverse transcription, using a polymerase chain reaction assay designed to detect copia-like element reverse transcriptase sequences. copia-like retrotransposons were identified in 64 plant species as well as the photosynthetic protist Volvox carteri. The plant species included representatives from 9 of 10 plant divisions, including bryophytes, lycopods, ferns, gymnosperms, and angiosperms. DNA sequence analysis of 29 cloned PCR products and of a maize retrotransposon cDNA confirmed the identity of these sequences as copia-like reverse transcriptase sequences, thereby demonstrating that this class of retrotransposons is a ubiquitous component of plant genomes. VL - 89 ER - TY - JOUR T1 - Copia-like retrotransposons in plants: a brief introduction JF - The Plant Genetics NewsletterThe Plant Genetics Newsletter Y1 - 1992 A1 - Michael P. Cummings VL - 8 ER - TY - JOUR T1 - Characterization of enhancer-of-white-apricot in Drosophila melanogaster. JF - Genetics Y1 - 1990 A1 - Peng, X B A1 - Mount, S M KW - Alleles KW - Animals KW - Blotting, Northern KW - DNA Transposable Elements KW - Drosophila melanogaster KW - Eye Color KW - Female KW - Heterozygote KW - Homozygote KW - Male KW - Nucleic Acid Hybridization KW - PHENOTYPE KW - Poly A KW - Reproduction KW - RNA KW - RNA, Messenger KW - Transcription, Genetic AB -

The white-apricot (wa) allele differs from the wild-type white gene by the presence of the retrovirus-like transposable element copia within the transcription unit. Most RNAs derived from wa have 3' termini within this insertion, and only small amounts of structurally normal RNA are produced. The activity of wa is reduced in trans by a semidominant mutation in the gene Enhancer-of-white-apricot (E(wa). Flies that are wa and heterozygous for the enhancer have eyes which are much lighter than the orange-yellow of wa alone while E(wa) homozygotes have white eyes. This semidominant effect on pigmentation is correlated with a corresponding decrease in white RNA having wild type structure, and flies homozygous for E(wa) have increased levels of aberrant RNAs. Three reverant alleles of E(wa) generated by reversion of the dominant enhancer phenotype with gamma radiation are noncomplementing recessive lethals, with death occurring during the larval stage. The effects on wa eye pigmentation of varying doses of the original E(wa) allele, the wild type allele, and the revertant alleles suggest that the original E(wa) allele produces a product that interferes with the activity of the wild type gene and that the revertants are null alleles. We propose that the E(wa) gene product influences the activity of the downstream copia long terminal repeat in 3' end formation.

VL - 126 CP - 4 ER - TY - JOUR T1 - Complete nucleotide sequence of the Drosophila transposable element copia: homology between copia and retroviral proteins. JF - Mol Cell Biol Y1 - 1985 A1 - Mount, S M A1 - Rubin, G M KW - Amino Acid Sequence KW - Animals KW - Base Sequence KW - Codon KW - DNA Helicases KW - DNA Transposable Elements KW - Drosophila melanogaster KW - Gene Expression Regulation KW - Gene Products, gag KW - Integrases KW - Repetitive Sequences, Nucleic Acid KW - Retroviridae KW - RNA-Directed DNA Polymerase KW - Viral Envelope Proteins KW - Viral Proteins AB -

We have determined the complete nucleotide sequence of the copia element present at the white-apricot allele of the white locus in Drosophila melanogaster. This transposable element is 5,146 nucleotides long and contains a single long open reading frame of 4,227 nucleotides. Analysis of the coding potential of the large open reading frame, which appears to encode a polyprotein, revealed weak homology to a number of retroviral proteins, including a protease, nucleic acid-binding protein, and reverse transcriptase. Better homology existed between another part of the copia open reading frame and a region of the retroviral pol gene recently shown to be distinct from reverse transcriptase and required for the integration of circular DNA forms of the retroviral genome to form proviruses. Comparison of the copia sequence with those of the Saccharomyces cerevisiae transposable element Ty, several vertebrate retroviruses, and the D. melanogaster copia-like element 17.6 showed that Ty was most similar to copia, sharing amino acid sequence homology and organizational features not found in the other genetic elements.

VL - 5 CP - 7 ER - TY - JOUR T1 - A catalogue of splice junction sequences. JF - Nucleic Acids Res Y1 - 1982 A1 - Mount, S M KW - Animals KW - Base Sequence KW - genes KW - Genes, Viral KW - HUMANS KW - Repetitive Sequences, Nucleic Acid KW - RNA Splicing KW - Species Specificity AB -

Splice junction sequences from a large number of nuclear and viral genes encoding protein have been collected. The sequence CAAG/GTAGAGT was found to be a consensus of 139 exon-intron boundaries (or donor sequences) and (TC)nNCTAG/G was found to be a consensus of 130 intron-exon boundaries (or acceptor sequences). The possible role of splice junction sequences as signals for processing is discussed.

VL - 10 CP - 2 ER -