Theoretical advances for comparative genomics

The Department has started a new “Research result of the month” series written by Professor and Deputy Head Sasu Tarkoma. In the first result of the month report, we interview Dr. Djamal Belazzougui about his recent accepted scientific article in the prestigious STOC conference, the ACM Symposium on the Theory of Computing.

Research result of the month: February 2014

The Department has started a new “Research result of the month” series written by Professor and Deputy Head Sasu Tarkoma. This series presents the researchers behind the latest scientific achievements, gives glimpses of scientists in action, and discusses the societal and industrial relevance of the results without forgetting the big picture.

Theoretical advances for comparative genomics

In the first result of the month report, we interview Dr. Djamal Belazzougui about his recent accepted scientific article in the prestigious STOC conference, the ACM Symposium on the Theory of Computing (link http://www.columbia.edu/~cs2035/stoc/stoc2014/index.shtml). The main result of the work pertains to optimal space-efficient randomized algorithms for constructing compressed suffix arrays and compressed suffix trees. The result has applications in multiple domains, in particular tools for comparative genomics. The theoretical result is already being implemented by a group of students in a software engineering project for practical impact.

Dr. Belazzougui completed an engineering degree in Algeria in 2005 and then a PhD degree in France in 2011. After visiting Denmark in the Spring of 2012, he started as a post doctoral researcher at the Department of Computer Science in the Fall of 2012 in Professor Veli Mäkinen’s group “Genome-scale Algorithmics” (link http://www.cs.helsinki.fi/en/gsa/).

To gain more insight into the latest discovery, we asked Dr. Belazzougui to tell a bit about the background. The work builds on the classic suffix array and the suffix tree structures that have been known for a very long time. “Fourteen years ago a new way was found for compressing suffix arrays and suffix trees that allowed the reduction of space by a factor of 10,” he explains. Minimizing the space of the structure is key for many applications where large data sets are examined and manipulated. For example, this breakthrough enabled the indexing of the human genome on a computer with 4 gigabytes of memory that was not possible with the earlier structures.

The latest discovery was kindled by an open problem in the early work on compressed suffix arrays and trees, namely how to quickly build the compressed structure in the same space. “This open problem was almost completely solved about ten years ago for compressed suffix arrays, but it remained open for the compressed suffix tree,” Dr. Belazzougui clarifies.

A powerful variant of the compressed suffix array, the Bidirectional Burrows-Wheeler transform, was discovered around 2010. This newer variant is superior to the previous algorithm in that it runs in linear time. Professor Veli Mäkinen and Dr. Belazzougui had an idea to apply this new variant for various sequence analysis problems. “We eventually found that most of the sequence analysis described in the literature that were previously solved using the classical suffix tree could now be solved in linear time with the Bidirectional Burrows-Wheeler transform while being more space efficient,” he explains the key insight.

From the right: Dr. Belazzougui, Professor Veli Mäkinen and Dr. Fabio Cunial presenting and discussing the new result.

After presenting an article on the various variants of the Bidirectional Burrows-Wheeler transform at the European Symposium on Algorithms (ESA) in 2013 (link http://link.springer.com/chapter/10.1007%2F978-3-642-40450-4_12), one major open question remained: How to build those variants of the Burrows-Wheeler transform quickly and with minimal space.

After working on this open problem, Dr. Belazzougui found a solution: “Surprisingly it turned out that the Bidirectional Burrows-Wheeler can allow to build the compressed suffix tree in small space with a very simple way”. This result would not have been possible to obtain five years ago, but building on techniques described in the last two to three years, most notably by the team of Professor Enno Ohlebusch, it was possible to solve the problem.

The scientific article builds on the past work and presents a set of optimal space-efficient randomized algorithms for constructing compressed suffix arrays and compressed suffix trees. Space-efficiency of the algorithms means that the maximum space required by them is asymptotically bounded by the size of the input.

The result has many applications in the field of string algorithms. In particular sequence analysis problems can now be solved in linear time and with space that is essentially a constant factor larger than the size of the input and output of the problem being solved.

The work solves an important open problem; however, there are still many research questions that need to be addressed. The algorithms elaborated in the scientific article are optimal with the caveat that they require randomization. The challenge of devising a deterministic worst-case linear time algorithm is still open. Implementation of the algorithms for more practical impact is another important activity. “We are implementing some of the algorithms presented in the article and in our previous work from ESA 2013,” he explains the current implementation plans.

As general advice for PhD students on making discoveries, he recommends reading articles from different topics and fields. “You may make unexpected connections that no one has made before,” he concludes.

Link to the manuscript in the ArXiv collection (link http://arxiv.org/abs/1401.0936).

Created date

03.03.2014 - 10:05

Brain poetry

In the latest research result of the month section, we interview PhD student Jukka Toivanen about his recent work on brain poetry in the Discovery group led by professor Hannu Toivonen. How can humans and machines be creative together?

Kjell Lemström to be new Head of Studies

Since Jaakko Kurhila left the department to head the Open University, we had to find a new university lecturer to act as head of studies in short order. We received a total of 28 applications. Out of these, and after a preliminary qualification round, evaluations, interviews, and a department council hearing, Kjell Lemström (KL) was elected for the post. He started working as the department's Head of Studies on 2 March 2015, so the Head of the department (JP) conducted the following induction interview that very week.

This is by no means the first time Kjell has been employed by the department. He defended his thesis on ‘String Matching Techniques for Music Retrieval’ in 2000, and has held numerous teaching and research positions both before and after that, until he transferred to the Laurea University of Applied Sciences in 2011 (luckily, that was only temporary).

Head of Studies Jaakko Kurhila to head Open University

The Head of Studies at the department, University Lecturer Jaakko Kurhila, has been elected to the post of director of the Open University at the University of Helsinki. It was a tough race: all in all, 39 applicants sought the post, some of them through the Mercuri Urval headhunting process. After a consultant evaluation, interviews, and aptitude assessments, the preparatory committee for the post, the steering committee for the Open University, and the rector of the university came to a unanimous decision to select Jaakko, and the contract is already being drawn up.

Being selected from this prestigious group of applicants, and after such a thorough process, is indisputable proof of the qualifications of Jaakko and the high esteem the academic community has for him. The department extends its warmest congratulations to Jaakko for this career development and is proud of the success of its protégé.

Bridging the Gap Between Research and Standardization

In the fourth research result of the month, we report a joint work between the UH NODES group and the Cambridge NetOS group, lead by Prof. Sasu Tarkoma and Prof. Jon Crowcroft, respectively. Their work recently received the best paper award "Best of CCR" from ACM SIGCOMM.