Theoretical advances for comparative genomics

The Department has started a new “Research result of the month” series written by Professor and Deputy Head Sasu Tarkoma. In the first result of the month report, we interview Dr. Djamal Belazzougui about his recent accepted scientific article in the prestigious STOC conference, the ACM Symposium on the Theory of Computing.

Research result of the month: February 2014

The Department has started a new “Research result of the month” series written by Professor and Deputy Head Sasu Tarkoma. This series presents the researchers behind the latest scientific achievements, gives glimpses of scientists in action, and discusses the societal and industrial relevance of the results without forgetting the big picture.

Theoretical advances for comparative genomics

In the first result of the month report, we interview Dr. Djamal Belazzougui about his recent accepted scientific article in the prestigious STOC conference, the ACM Symposium on the Theory of Computing (link http://www.columbia.edu/~cs2035/stoc/stoc2014/index.shtml). The main result of the work pertains to optimal space-efficient randomized algorithms for constructing compressed suffix arrays and compressed suffix trees. The result has applications in multiple domains, in particular tools for comparative genomics. The theoretical result is already being implemented by a group of students in a software engineering project for practical impact.

Dr. Belazzougui completed an engineering degree in Algeria in 2005 and then a PhD degree in France in 2011. After visiting Denmark in the Spring of 2012, he started as a post doctoral researcher at the Department of Computer Science in the Fall of 2012 in Professor Veli Mäkinen’s group “Genome-scale Algorithmics” (link http://www.cs.helsinki.fi/en/gsa/).

To gain more insight into the latest discovery, we asked Dr. Belazzougui to tell a bit about the background. The work builds on the classic suffix array and the suffix tree structures that have been known for a very long time. “Fourteen years ago a new way was found for compressing suffix arrays and suffix trees that allowed the reduction of space by a factor of 10,” he explains. Minimizing the space of the structure is key for many applications where large data sets are examined and manipulated. For example, this breakthrough enabled the indexing of the human genome on a computer with 4 gigabytes of memory that was not possible with the earlier structures.

The latest discovery was kindled by an open problem in the early work on compressed suffix arrays and trees, namely how to quickly build the compressed structure in the same space. “This open problem was almost completely solved about ten years ago for compressed suffix arrays, but it remained open for the compressed suffix tree,” Dr. Belazzougui clarifies.

A powerful variant of the compressed suffix array, the Bidirectional Burrows-Wheeler transform, was discovered around 2010. This newer variant is superior to the previous algorithm in that it runs in linear time. Professor Veli Mäkinen and Dr. Belazzougui had an idea to apply this new variant for various sequence analysis problems. “We eventually found that most of the sequence analysis described in the literature that were previously solved using the classical suffix tree could now be solved in linear time with the Bidirectional Burrows-Wheeler transform while being more space efficient,” he explains the key insight.

From the right: Dr. Belazzougui, Professor Veli Mäkinen and Dr. Fabio Cunial presenting and discussing the new result.

After presenting an article on the various variants of the Bidirectional Burrows-Wheeler transform at the European Symposium on Algorithms (ESA) in 2013 (link http://link.springer.com/chapter/10.1007%2F978-3-642-40450-4_12), one major open question remained: How to build those variants of the Burrows-Wheeler transform quickly and with minimal space.

After working on this open problem, Dr. Belazzougui found a solution: “Surprisingly it turned out that the Bidirectional Burrows-Wheeler can allow to build the compressed suffix tree in small space with a very simple way”. This result would not have been possible to obtain five years ago, but building on techniques described in the last two to three years, most notably by the team of Professor Enno Ohlebusch, it was possible to solve the problem.

The scientific article builds on the past work and presents a set of optimal space-efficient randomized algorithms for constructing compressed suffix arrays and compressed suffix trees. Space-efficiency of the algorithms means that the maximum space required by them is asymptotically bounded by the size of the input.

The result has many applications in the field of string algorithms. In particular sequence analysis problems can now be solved in linear time and with space that is essentially a constant factor larger than the size of the input and output of the problem being solved.

The work solves an important open problem; however, there are still many research questions that need to be addressed. The algorithms elaborated in the scientific article are optimal with the caveat that they require randomization. The challenge of devising a deterministic worst-case linear time algorithm is still open. Implementation of the algorithms for more practical impact is another important activity. “We are implementing some of the algorithms presented in the article and in our previous work from ESA 2013,” he explains the current implementation plans.

As general advice for PhD students on making discoveries, he recommends reading articles from different topics and fields. “You may make unexpected connections that no one has made before,” he concludes.

Link to the manuscript in the ArXiv collection (link http://arxiv.org/abs/1401.0936).

Created date

03.03.2014 - 10:05

The university’s team Game of Nolife won Western European programming contest for students

In the finals in Thailand in spring 2016, the students from the University of Helsinki will face the best teams in the world.

The University of Helsinki has won the inter-university NWERC 2015 programming contest that was held in Linköping recently. It was attended by 95 teams from Western Europe. The Game of Nolife team from the University of Helsinki consisted of computer-science and maths students Tuukka Korhonen, Olli Hirviniemi and Otte Heinävaara.

The Carat research team has published a dataset focusing on collaborative energy diagnostics of mobile devices and applications

 

 

The Carat research team from University of Helsinki publishes a dataset from the Carat project (http://carat.cs.helsinki.fi/) focusing on collaborative energy diagnostics of mobile devices and applications. The dataset was presented at the IEEE PerCom’15 conference last spring in the publication "Energy Modeling of System Settings: A Crowdsourced Approach" that won the Marc Weiser Best Paper Award given at the conference.

Eemil Lagerspetz was awarded a grant by the Jorma Ollila fund of Nokia Foundation on November 24, 2015

 

 
 
Eemil Lagerspetz was awarded a grant by the Jorma Ollila fund of Nokia Foundation on November 24, 2015. Congratulations!
 
The fund was launched in year 2014 to support post doctoral research career development. 
The title of Eemil’s post doctoral research is “Mind The Gap: Combining Trajectory Datasets for a Holistic Picture of Human Mobility” and the research will be carried out at the Hong Kong University of Science and Technology (HKUST) in 2016.
 

Collaborative Networking (CoNe) group researchers got the best paper award at 2nd ACM Conference on Information-Centric Networking (ICN 2015)

 

Collaborative Networking (CoNe) group researchers got the best paper award at 2nd ACM Conference on Information-Centric Networking (ICN 2015), one of the most prestigious venues for ICN research. The article entitled Pro-Diluvian: Understanding Scoped-Flooding for Content Discovery in ICN is lead by Liang Wang - a recent PhD graduate from CoNe research group, and is the outcome of collaboration with Suzan Bayhan and Jussi Kangasharju from UH, Jörg Ott from Aalto University, Arjuna Sathiaseelan and Jon Crowcroft from Cambridge University.