Theoretical advances for comparative genomics

The Department has started a new “Research result of the month” series written by Professor and Deputy Head Sasu Tarkoma. In the first result of the month report, we interview Dr. Djamal Belazzougui about his recent accepted scientific article in the prestigious STOC conference, the ACM Symposium on the Theory of Computing.

Research result of the month: February 2014

The Department has started a new “Research result of the month” series written by Professor and Deputy Head Sasu Tarkoma. This series presents the researchers behind the latest scientific achievements, gives glimpses of scientists in action, and discusses the societal and industrial relevance of the results without forgetting the big picture.

Theoretical advances for comparative genomics

In the first result of the month report, we interview Dr. Djamal Belazzougui about his recent accepted scientific article in the prestigious STOC conference, the ACM Symposium on the Theory of Computing (link http://www.columbia.edu/~cs2035/stoc/stoc2014/index.shtml). The main result of the work pertains to optimal space-efficient randomized algorithms for constructing compressed suffix arrays and compressed suffix trees. The result has applications in multiple domains, in particular tools for comparative genomics. The theoretical result is already being implemented by a group of students in a software engineering project for practical impact.

Dr. Belazzougui completed an engineering degree in Algeria in 2005 and then a PhD degree in France in 2011. After visiting Denmark in the Spring of 2012, he started as a post doctoral researcher at the Department of Computer Science in the Fall of 2012 in Professor Veli Mäkinen’s group “Genome-scale Algorithmics” (link http://www.cs.helsinki.fi/en/gsa/).

To gain more insight into the latest discovery, we asked Dr. Belazzougui to tell a bit about the background. The work builds on the classic suffix array and the suffix tree structures that have been known for a very long time. “Fourteen years ago a new way was found for compressing suffix arrays and suffix trees that allowed the reduction of space by a factor of 10,” he explains. Minimizing the space of the structure is key for many applications where large data sets are examined and manipulated. For example, this breakthrough enabled the indexing of the human genome on a computer with 4 gigabytes of memory that was not possible with the earlier structures.

The latest discovery was kindled by an open problem in the early work on compressed suffix arrays and trees, namely how to quickly build the compressed structure in the same space. “This open problem was almost completely solved about ten years ago for compressed suffix arrays, but it remained open for the compressed suffix tree,” Dr. Belazzougui clarifies.

A powerful variant of the compressed suffix array, the Bidirectional Burrows-Wheeler transform, was discovered around 2010. This newer variant is superior to the previous algorithm in that it runs in linear time. Professor Veli Mäkinen and Dr. Belazzougui had an idea to apply this new variant for various sequence analysis problems. “We eventually found that most of the sequence analysis described in the literature that were previously solved using the classical suffix tree could now be solved in linear time with the Bidirectional Burrows-Wheeler transform while being more space efficient,” he explains the key insight.

From the right: Dr. Belazzougui, Professor Veli Mäkinen and Dr. Fabio Cunial presenting and discussing the new result.

After presenting an article on the various variants of the Bidirectional Burrows-Wheeler transform at the European Symposium on Algorithms (ESA) in 2013 (link http://link.springer.com/chapter/10.1007%2F978-3-642-40450-4_12), one major open question remained: How to build those variants of the Burrows-Wheeler transform quickly and with minimal space.

After working on this open problem, Dr. Belazzougui found a solution: “Surprisingly it turned out that the Bidirectional Burrows-Wheeler can allow to build the compressed suffix tree in small space with a very simple way”. This result would not have been possible to obtain five years ago, but building on techniques described in the last two to three years, most notably by the team of Professor Enno Ohlebusch, it was possible to solve the problem.

The scientific article builds on the past work and presents a set of optimal space-efficient randomized algorithms for constructing compressed suffix arrays and compressed suffix trees. Space-efficiency of the algorithms means that the maximum space required by them is asymptotically bounded by the size of the input.

The result has many applications in the field of string algorithms. In particular sequence analysis problems can now be solved in linear time and with space that is essentially a constant factor larger than the size of the input and output of the problem being solved.

The work solves an important open problem; however, there are still many research questions that need to be addressed. The algorithms elaborated in the scientific article are optimal with the caveat that they require randomization. The challenge of devising a deterministic worst-case linear time algorithm is still open. Implementation of the algorithms for more practical impact is another important activity. “We are implementing some of the algorithms presented in the article and in our previous work from ESA 2013,” he explains the current implementation plans.

As general advice for PhD students on making discoveries, he recommends reading articles from different topics and fields. “You may make unexpected connections that no one has made before,” he concludes.

Link to the manuscript in the ArXiv collection (link http://arxiv.org/abs/1401.0936).

Created date

03.03.2014 - 10:05

International Staff Get-together

On Friday 3rd of June the department and the international well-being group invited the international staff to a picnic and a guided tour on Suomenlinna. With several bags filled with delicious food and drinks we started in the late afternoon towards the island, where we found a sunny spot to enjoy the warm weather, the good company and of course the food: from bread over vegetables and fruits to cheese, we had it all. Besides the about 10 international attendees, also our head of department, Esko Ukkonen, found the time to join us.

Also, check out the new English speaking blog at http://www.cs.helsinki.fi/en/blog.

The story of the little games studio

Ninro gameMika Urtela and Hannu Pajula, graduates of the Department of Computer Science, are realising their piña colada-flavoured dreams in their game-producing company, Soul Aim Studios. This is the beginning of their story. The piña colada part of it has not come true yet, but they're working at it.

Jürgen Münch started as FiDiPro professor

 FiDiPro - the Finland Distinguished Professor Programme - enables distinguished international researchers to work and team up with the 'best of the best' in Finnish academic research. Financed by the Academy of Finland and Tekes, FiDiPro makes it possible to recruit highly merited scientists who are able to commit to long-term cooperation with a Finnish university or research institute.

In March 2011, the Department of Computer Science got its first FiDiPro professor, Jürgen Münch. Let’s interview Jürgen and learn to know him better.
 

Linus Torvalds made honorary alumnus in Kumpula

The Faculty of Science invited Linus Torvalds to be its honorary alumnus in Kumpula. During its alumni event, the faculty also named one of its lecture halls after Linus Torvalds. On the event held on Thursday 17 March, young researchers were the main speakers, and the 350 guests on the science campus gave them their full attention.