Speaker: | Frantisek Franek |
Department of Computing and Software | |
Faculty of Engineering | |
McMaster University |
Title: On the number of fundamental periodicities
Periodicities of a string provide a key characteristics of the string�s structure. Their knowledge is important for many applications, notably for data compression and in DNA and protein analysis. A basic periodical structure is a repetition of power k, i.e. a tandem repeat of a substring k times. When k equals 2, we talk of a square. Since 1982, it has been known that there are at most O(n log n) repetitions in a string of length n and there are many algorithms that can compute occurrences of all repetitions in O(n log n) time. In 1998, Fraenkel and Simpson considered the number of types of squares in a string, the so-called number of distinct squares problem, and presented an upper bound of twice the length of the string. A succinct notation comprising several squares of the same period into a run was introduced by Main in 1982. Kolpakov and Kucherov in 1999 showed that the number of runs must be linear in the length of the string and many researchers established since asymptotic lower and upper bounds very close to the hypothesized value n, the length of the string. There are now linear algorithms to compute the number of runs in a given string. However, tight non-asymptotic lower and upper bounds for both problems, the maximum number of distinct squares and the maximum number of runs, are still unknown. In 2009, Deza and Franek proposed a novel approach to both problems which introduces a new parameter in addition to the usual length of the string -- the number d of distinct symbols of the string. In the talk, the so-called d-step approach and a complete overview of the research to-date will be presented, including joint works with Andrew Baker on the maximum number of runs, and with Mei Jiang on the maximum number of distinct squares.
A joint work with A. Deza.