TY - Generic
T1 - Inexact Local Alignment Search over Suffix Arrays
T2 - IEEE International Conference on Bioinformatics and Biomedicine, 2009. BIBM '09
Y1 - 2009
A1 - Ghodsi, M.
A1 - M. Pop
KW - bacteria
KW - Bioinformatics
KW - biology computing
KW - Computational Biology
KW - Costs
KW - DNA
KW - DNA homology searches
KW - DNA sequences
KW - Educational institutions
KW - generalized heuristic
KW - genes
KW - Genetics
KW - genome alignment
KW - Genomics
KW - human
KW - inexact local alignment search
KW - inexact seeds
KW - local alignment
KW - local alignment tools
KW - memory efficient suffix array
KW - microorganisms
KW - molecular biophysics
KW - mouse
KW - Organisms
KW - Sensitivity and Specificity
KW - sequences
KW - suffix array
KW - USA Councils
AB - We describe an algorithm for finding approximate seeds for DNA homology searches. In contrast to previous algorithms that use exact or spaced seeds, our approximate seeds may contain insertions and deletions. We present a generalized heuristic for finding such seeds efficiently and prove that the heuristic does not affect sensitivity. We show how to adapt this algorithm to work over the memory efficient suffix array with provably minimal overhead in running time. We demonstrate the effectiveness of our algorithm on two tasks: whole genome alignment of bacteria and alignment of the DNA sequences of 177 genes that are orthologous in human and mouse. We show our algorithm achieves better sensitivity and uses less memory than other commonly used local alignment tools.
JA - IEEE International Conference on Bioinformatics and Biomedicine, 2009. BIBM '09
PB - IEEE
SN - 978-0-7695-3885-3
ER -
TY - Generic
T1 - Dynamic querying for pattern identification in microarray and genomic data
T2 - 2003 International Conference on Multimedia and Expo, 2003. ICME '03. Proceedings
Y1 - 2003
A1 - Hochheiser, H.
A1 - Baehrecke, E. H.
A1 - Stephen M. Mount
A1 - Shneiderman, Ben
KW - Bioinformatics
KW - data sets
KW - Displays
KW - dynamic querying
KW - expression profiles
KW - Frequency
KW - Gene expression
KW - genes
KW - Genetics
KW - genomic data
KW - Genomics
KW - linear ordered sequences
KW - macromolecules
KW - medical signal processing
KW - Mice
KW - Microarray
KW - pattern identification
KW - pattern recognition
KW - premRNA splicing
KW - Query processing
KW - sequences
KW - Signal processing
KW - splicing
KW - TimeSearcher
AB - Data sets involving linear ordered sequences are a recurring theme in bioinformatics. Dynamic query tools that support exploration of these data sets can be useful for identifying patterns of interest. This paper describes the use of one such tool - timesearcher - to interactively explore linear sequence data sets taken from two bioinformatics problems. Microarray time course data sets involve expression levels for large numbers of genes over multiple time points. Timesearcher can be used to interactively search these data sets for genes with expression profiles of interest. The occurrence frequencies of short sequences of DNA in aligned exons can be used to identify sequences that play a role in the pre-mRNA splicing. Timesearcher can be used to search these data sets for candidate splicing signals.
JA - 2003 International Conference on Multimedia and Expo, 2003. ICME '03. Proceedings
PB - IEEE
VL - 3
SN - 0-7803-7965-9
ER -
TY - Generic
T1 - A SIMD solution to the sequence comparison problem on the MGAP
T2 - International Conference on Application Specific Array Processors, 1994. Proceedings
Y1 - 1994
A1 - Borah, M.
A1 - Bajwa, R. S.
A1 - Sridhar Hannenhalli
A1 - Irwin, M. J.
KW - AT-optimal algorithm
KW - Biological information theory
KW - biology computing
KW - biosequence comparison problem
KW - computational complexity
KW - Computer science
KW - Costs
KW - database size
KW - Databases
KW - DNA computing
KW - dynamic programming
KW - dynamic programming algorithms
KW - fine-grained massively parallel processor array
KW - Genetics
KW - Heuristic algorithms
KW - maximally similar sequence
KW - MGAP parallel computer
KW - Micro-Grain Array Processor
KW - Military computing
KW - molecular biology
KW - molecular biophysics
KW - Nearest neighbor searches
KW - nearest-neighbor connections
KW - Parallel algorithms
KW - pipeline processing
KW - pipelined SIMD solution
KW - sequence alignment problem
KW - sequences
AB - Molecular biologists frequently compare an unknown biosequence with a set of other known biosequences to find the sequence which is maximally similar, with the hope that what is true of one sequence, either physically or functionally, could be true of its analogue. Even though efficient dynamic programming algorithms exist for the problem, when the size of the database is large, the time required is quite long, even for moderate length sequences. In this paper, we present an efficient pipelined SIMD solution to the sequence alignment problem on the Micro-Grain Array Processor (MGAP), a fine-grained massively parallel array of processors with nearest-neighbor connections. The algorithm compares K sequences of length O(M) with the actual sequence of length N, in O(M+N+K) time with O(MN) processors, which is AT-optimal. The implementation on the MGAP computes at the rate of about 0.1 million comparisons per second for sequences of length 128
JA - International Conference on Application Specific Array Processors, 1994. Proceedings
PB - IEEE
SN - 0-8186-6517-3
ER -