Theoretical advances for comparative genomics

The Department has started a new “Research result of the month” series written by Professor and Deputy Head Sasu Tarkoma. In the first result of the month report, we interview Dr. Djamal Belazzougui about his recent accepted scientific article in the prestigious STOC conference, the ACM Symposium on the Theory of Computing.

Research result of the month: February 2014

The Department has started a new “Research result of the month” series written by Professor and Deputy Head Sasu Tarkoma. This series presents the researchers behind the latest scientific achievements, gives glimpses of scientists in action, and discusses the societal and industrial relevance of the results without forgetting the big picture.

Theoretical advances for comparative genomics

In the first result of the month report, we interview Dr. Djamal Belazzougui about his recent accepted scientific article in the prestigious STOC conference, the ACM Symposium on the Theory of Computing (link http://www.columbia.edu/~cs2035/stoc/stoc2014/index.shtml). The main result of the work pertains to optimal space-efficient randomized algorithms for constructing compressed suffix arrays and compressed suffix trees. The result has applications in multiple domains, in particular tools for comparative genomics. The theoretical result is already being implemented by a group of students in a software engineering project for practical impact.

Dr. Belazzougui completed an engineering degree in Algeria in 2005 and then a PhD degree in France in 2011. After visiting Denmark in the Spring of 2012, he started as a post doctoral researcher at the Department of Computer Science in the Fall of 2012 in Professor Veli Mäkinen’s group “Genome-scale Algorithmics” (link http://www.cs.helsinki.fi/en/gsa/).

To gain more insight into the latest discovery, we asked Dr. Belazzougui to tell a bit about the background. The work builds on the classic suffix array and the suffix tree structures that have been known for a very long time. “Fourteen years ago a new way was found for compressing suffix arrays and suffix trees that allowed the reduction of space by a factor of 10,” he explains. Minimizing the space of the structure is key for many applications where large data sets are examined and manipulated. For example, this breakthrough enabled the indexing of the human genome on a computer with 4 gigabytes of memory that was not possible with the earlier structures.

The latest discovery was kindled by an open problem in the early work on compressed suffix arrays and trees, namely how to quickly build the compressed structure in the same space. “This open problem was almost completely solved about ten years ago for compressed suffix arrays, but it remained open for the compressed suffix tree,” Dr. Belazzougui clarifies.

A powerful variant of the compressed suffix array, the Bidirectional Burrows-Wheeler transform, was discovered around 2010. This newer variant is superior to the previous algorithm in that it runs in linear time. Professor Veli Mäkinen and Dr. Belazzougui had an idea to apply this new variant for various sequence analysis problems. “We eventually found that most of the sequence analysis described in the literature that were previously solved using the classical suffix tree could now be solved in linear time with the Bidirectional Burrows-Wheeler transform while being more space efficient,” he explains the key insight.

From the right: Dr. Belazzougui, Professor Veli Mäkinen and Dr. Fabio Cunial presenting and discussing the new result.

After presenting an article on the various variants of the Bidirectional Burrows-Wheeler transform at the European Symposium on Algorithms (ESA) in 2013 (link http://link.springer.com/chapter/10.1007%2F978-3-642-40450-4_12), one major open question remained: How to build those variants of the Burrows-Wheeler transform quickly and with minimal space.

After working on this open problem, Dr. Belazzougui found a solution: “Surprisingly it turned out that the Bidirectional Burrows-Wheeler can allow to build the compressed suffix tree in small space with a very simple way”. This result would not have been possible to obtain five years ago, but building on techniques described in the last two to three years, most notably by the team of Professor Enno Ohlebusch, it was possible to solve the problem.

The scientific article builds on the past work and presents a set of optimal space-efficient randomized algorithms for constructing compressed suffix arrays and compressed suffix trees. Space-efficiency of the algorithms means that the maximum space required by them is asymptotically bounded by the size of the input.

The result has many applications in the field of string algorithms. In particular sequence analysis problems can now be solved in linear time and with space that is essentially a constant factor larger than the size of the input and output of the problem being solved.

The work solves an important open problem; however, there are still many research questions that need to be addressed. The algorithms elaborated in the scientific article are optimal with the caveat that they require randomization. The challenge of devising a deterministic worst-case linear time algorithm is still open. Implementation of the algorithms for more practical impact is another important activity. “We are implementing some of the algorithms presented in the article and in our previous work from ESA 2013,” he explains the current implementation plans.

As general advice for PhD students on making discoveries, he recommends reading articles from different topics and fields. “You may make unexpected connections that no one has made before,” he concludes.

Link to the manuscript in the ArXiv collection (link http://arxiv.org/abs/1401.0936).

Created date

03.03.2014 - 10:05

Mobile cloud computing makes data centres obsolete

Researcher Eemil Lagerspetz intends to move computing from computers to pocket devices and from data centres to homes.

Implementing cloud computing with mobile devices is being studied at the University of Helsinki. Mobile cloud computing refers to computing with smartphones or other mobile or Internet of Things devices in the environment, such as smart TVs or smart fridges. Without mobile devices, cloud computing means carrying out large tasks on computers linked together by network connections. In traditional computing, the tasks are carried out with computers that are physically located in the same space.

 

Workshop on Mobile Services and Edge Computing

 

 

 

 

 

 

 

The 3rd Helsinki-HKUST-Tsinghua workshop was chaired by Professor Sasu Tarkoma from University of Helsinki and Dr. Aaron Yi Ding from Technical University of Munich. The workshop was held at the University of Helsinki from July 27th to 29th, 2016.

Read more

 

Professor Esko Ukkonen invited to the Estonian Academy of Sciences

Professor Esko Ukkonen has been invited to the Estonian Academy of Sciences as a foreign member.

Esko Ukkonen has had contacts to the computer science community in Estonia from the beginning of the 1990s, and he has supervised the work of several Estonian postgraduates. The Estonian Academy of Sciences has 78 ordinary and 21 foreign members.

The First Europe-China Workshop on Big Data Management

Some attenders of this workshopThe first Europe-China workshop on big data management was successfully held on the 16th of May, 2016 at the Department of Computer Science, University of Helsinki. 

This one-day workshop organized by Prof. Jiaheng Lu (University of Helsinki), Prof. Xiaoyong Du (Renmin University of China), and Prof. Christian S. Jensen (Aalborg University, Denmark). The aims of this workshop were to gather experts in big data management to exchange views on cutting-edge data management problems and create opportunities for establishing new collaborations between EU and China computer scientists.