Promoter prediction in the human genome

TitlePromoter prediction in the human genome
Publication TypeJournal Articles
Year of Publication2001
AuthorsHannenhalli S, Levy S.
Type of Article10.1093/bioinformatics/17.suppl_1.S90
ISBN Number1367-4803, 1460-2059

Computational prediction of eukaryotic polII promoters has been one of the most elusive problems despite considerable effort devoted to the study. Researchers have looked for various types of signals around the transcriptional start site (TSS), viz. oligo-nucleotide statistics, potential binding sites for core factors, clusters of binding sites, proximity to CpG islands etc.. The proximity of CpG islands to gene starts is now a well established fact, although until recently, it was based on very little genomic data. In this work we explore the possibility of enhancing the promoter prediction accuracy by combining CpG island information with a few other, biologically motivated, seemingly independent signals, that cover most of the known knowledge. We benchmarked the method on a much larger genomic datasets compared to previous studies. We were able to improve slightly upon current prediction accuracy. Furthermore, we observe that CpG islands are the most dominant signals and the other signals do not improve the prediction. This suggests that the computational prediction of promoters for genes with no associated CpG-island (typically having tissue-specific expression) looking only at the immediate neighborhood of the TSS may not even be possible. We suggest some biological experiments and studies to better understand the biology of transcription.