next up previous
Next: Machine learning Up: a) General Computer Previous: a) General Computer

Algorithms on Strings

Research on strings algorithms, ``stringology'' started at the department quite early, in 1981. The initial impulse came from computer applications in molecular genetics where there is a lot of demand for efficient and sophisticated string-manipulation algorithms. The group is now one of the leaders in its special field in the world and has obtained several basic results that have been included in recent international text books. The group organized the 6th Annual Symposium on Combinatorial Pattern Matching in Helsinki in 1995.

The reputation of the group is based on the work done in the 1980's on developing new algorithms for the following problems: edit distance, approximate string matching, DNA sequence assembly, and shortest common superstring. In 1990's, various ``sublinear'' approximate string-matching algorithms and new measures for string similarity have been developed. For the important problem of how a text should be preprocessed to speed-up subsequent approximate string searches, several solutions have been discovered based on the suffix-tree of the text or the so-called ``q-grams''. For constructing suffix-trees a natural ``on-line'' algorithm has been developed.

The two-dimensional string matching problem has also been studied. We were able to obtain the first optimal expected time solutions using simple, practical algorithms. This is in contrast with the many worst-case optimal solutions proposed in the literature which are very complicated and have small practical value. Another new research direction is the study of special databases for strings. We have obtained initial results on reasoning about strings in databases and on extending the relational database model to allow for typical queries on strings.

The on-going work deals with different feature-based schemes for approximate string matching, two and higher dimensional string matching, the indexing of strings, and string databases. Computational biology will get more emphasis in the future.

Current members of the group are Prof. Esko Ukkonen (group leader), Doc. Gösta Grahne, Doc. Jorma Tarhio, Raul Hakli, M.Sc. Juha Kärkkäinen, M.Sc. Kai Korpimies, Outi Lehtinen, M.Sc. Matti Nykänen, and M.Sc. Erkki Sutinen. The group gets support from the Academy of Finland.

Publications: [95-98, 100-105, 110, 112-123, 139].

Home page: http://www.cs.helsinki.fi/research/pmdm/cpm/



next up previous
Next: Machine learning Up: a) General Computer Previous: a) General Computer