Speaker:   Dr. Bin Ma
  Department of Computer Science
  University of Western Ontario


Title: Pattern Hunter - Optimized Spaced Seeds for Homology Search


Abstract:


Given two DNA/protein sequences, homology search requires the finding of all pairs of substrings, each from one sequence, that are similar to each other. Traditionally, this is done by the Smith-Waterman algorithm, which is too slow for large genomes. The BLAST program significantly improved the speed by using consecutive seeds (a short strand of exact matches) to pre-select the homology regions. However, this also significantly reduces the sensitivity. I.e. many real homologies may be lost during the pre-selection.

By replacing the consecutive seeds by optimized spaced seeds (several exact matches at some fixed positions), the sensitivity of the homology search is significantly improved. The spaced seeds, the reason of the sensitivity improvement, and the algorithm to select seeds will be introduced.