Seminar on Algorithms and Machine Learning : Topics

The schedule for presentations on Friday 13 June is in the department intranet.

Below are some possible topics. Since the seminar is aimed generally at students in algorithms and machine learning, I have tried to pick topics that can be understood after just taking the courses Design and Analysis of Algorithms and Introduction to Machine Learning but are not covered in the regular follow-up courses. Therefore, there are not many references to very recent research, but instead many topics are classics that don't quite fit into the basic courses. Trying to pick topics like this has lead to a large overlap with the previous installation of the seminar in 2011, which I haven't tried too hard to avoid, since the fact remains that the topics are generally important in algorithms and machine learning, yet not covered in our standard courses.

You are also encouraged to propose your own topic. It can be based on your own personal interests, as long as you keep in mind that you need to be able to make it interesting also to the other students in the seminar, who may have varying amounts of background knowledge.

For each topic, one or two papers has been mentioned to give you an idea of what to topic is about, and a starting point for finding literature. However it is not intended that you should base your work on just that one paper.

Interactive proof systems
Interactive proof systems are a way of analysing computational complexity by studying games between two players. This is related to topics such as zero-knowledge proofs and the PCP theorem.

László Babai and Shlomo Moran (1988). Arthur-Merlin games: a randomized proof system, and a hierarchy of complexity classes. Journal of Computer and System Sciences 36(2):254–276.

Shafi Goldwasser, Silvio Micali and Charlie Rackoff (1989). The knowledge complexity of interactive proof systems. SIAM Journal on Computing 18(1):186–208.

Probabilistically Checkable Proofs (PCP)
The so-called PCP theorem gives for the class NP a characterisation that in particular has lead to hardness results for approximating NP-hard optimisation problems. The proof of the theorem is quite complicated, but the result itself and its applications are easier to understand.

Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, Mario Szegedy (1998). Proof verification and the hardness of approximation problems. Journal of the ACM 45(3):501–555.

Approximations to Euclidean TSP
The Travelling Salesman Problem in its most general form cannot be solved in polynomial time even approximately, unless P=NP. However, if make the additional assumption that the vertices of the graph are points in the Euclidean plane, and the costs of edges are distances between the vertices, polynomial-time approximation algorithms are known. Recently, it has been shown that there actually is a polynomial-time approximation scheme, that is, a polynomial-time algorithm for any given approximation accuracy.

Sanjeev Arora (1998). Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems. Journal of the ACM 45(5):753&ndahs;782.

Joseph S. B. Mitchell (1999) Guillotine subdivisions approximate polygonal subdivisions: A simple polynomial-time approximation scheme for geometric TSP, k-MST, and related problems. SIAM Journal on Computing 28(4):1298–1309.

Online algorithms and competitive analysis
There are many optimisation problems where it might be natural that the all the input is not available at once, but decisions must be made sequentially as new input becomes available. For example, in scheduling we may need to get jobs started before we know what jobs are coming next. In competitive analysis, one compares the quality of optimisation result in such a scenario to what could have been achieved if all the input was available beforehand.

Baruch Awerbuch, Shay Kutten and David Peleg (1992). Competitive distributed job scheduling. In Proc. 24th Annual ACM Symposium on Theory of Computing (STOC '92), pp. 571–580.

Streaming algorithms
In data mining applications, the amount of data is typically huge, and often more data is generated continuously. In such situations, it is not feasible to try to first store all the data and then apply some batch algorithms. Instead, whatever computations we wish to do, must be made immediately as the data arrives, keeping in memory only the minimum amount of bookkeeping information.

Noga Alon, Yossi Matias, Mario Szegedy (1999). The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1):137–147.

Algorithmic game theory
Consider a setting where several agents try to each selfishly maximise their own utility function. For example, in a communication network, each user wants to get his messages delivered as fast as possible. One may then ask, how much the total performance of the network is degraded compared to a situation where some central authority would be able to force the users to act in the interest of greater common good. Further, one might consider how the protocols could be designed that it is not in the interest of a selfish user to behave in an overly disruptive manner.

Tim Roughgarden, Éva Tardos (2002) How bad is selfish routing? Journal of the ACM 49(2):236–259.

Noam Nisan, Amir Ronen (2001). Algorithmic Mechanism Design. Games and Economic Behavior 35(1–2):166–196.

Rendezvous in graphs
Consider two agents that move in a graph, hoping to end up in the same node. What can they do, assuming they do not initially know the topology of the graph?

Anders Dessmark, Pierre Fraigniaud, Dariusz R. Kowalski and Andrzej Pelc (2006). Deterministic Rendezvous in Graphs. Algorithmica 46(1):69–96.

Boosting
Boosting is a machine learning method where the basic idea is to take a large number of hypotheses that might not individually be very good and combine them into a much better aggregated hypothesis.

Yoav Freund and Robert E. Schapire (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1):119–139.

Spectral clustering
Spectral clustering uses linear algebra to analyse the whole matrix of pairwise similarities between the data points.

Ulrike von Luxburg (2007). A tutorial on spectral clustering. Statistics and Computing 17(4):395–416.

Random projections
Random projections are a technique used in dimension reduction, where the aim is to transform data points from a high dimensional space into a (possibly much) lower dimensional subspace, such that the data is altered as little as possible. If this kind of transformation can be done while still preserving the essential characteristics of the data, the data can be thought as residing inside a lower dimensional subspace, that's embedded inside the higher dimensional data space. Dimension reduction helps to make the data more easily learnable: Applying the learning algorithm (classification, clustering etc.) to the transformed data helps in avoiding the curse of dimensionality. At the same time, the computational requirements are decreased, which might be needed for further processing to be feasible.

Samuel Kaski (1998). Dimensionality reduction by random mapping: fast similarity computation for clustering. Proc. 1998 IEEE International Joint Conference On Neural Networks, volume 1, pages 413–418.

Deep learning
Deep learning means building a learning system in several layers, so that the early layers represent low-level concepts that are close to the raw presentation of data (say, edges detected from an image represented as an array of pixels), while later layers use the earlier layers as building blocks to represent higher level concepts closer to the actual task at hand.

Yoshua Bengio (2013). Deep learning of representations: looking forward. First International Conference on Statistical Language and Speech Processing, pages 1–37, Springer.