Inexact Local Alignment Search over Suffix Arrays

TitleInexact Local Alignment Search over Suffix Arrays
Publication TypeConference Proceedings
Year of Conference2009
AuthorsGhodsi M., Pop M.
Conference NameIEEE International Conference on Bioinformatics and Biomedicine, 2009. BIBM '09
Date Published2009
ISBN Number978-0-7695-3885-3
Keywordsbacteria, Bioinformatics, biology computing, Computational Biology, Costs, DNA, DNA homology searches, DNA sequences, Educational institutions, generalized heuristic, genes, Genetics, genome alignment, Genomics, human, inexact local alignment search, inexact seeds, local alignment, local alignment tools, memory efficient suffix array, microorganisms, molecular biophysics, mouse, Organisms, Sensitivity and Specificity, sequences, suffix array, USA Councils

We describe an algorithm for finding approximate seeds for DNA homology searches. In contrast to previous algorithms that use exact or spaced seeds, our approximate seeds may contain insertions and deletions. We present a generalized heuristic for finding such seeds efficiently and prove that the heuristic does not affect sensitivity. We show how to adapt this algorithm to work over the memory efficient suffix array with provably minimal overhead in running time. We demonstrate the effectiveness of our algorithm on two tasks: whole genome alignment of bacteria and alignment of the DNA sequences of 177 genes that are orthologous in human and mouse. We show our algorithm achieves better sensitivity and uses less memory than other commonly used local alignment tools.