Supervised Machine Learning : Additional references

Textbooks

There are many general textbooks on machine learning. The one with a point of view closest to this course is

  • Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification (2nd edition). Wiley 2001.

There is a relatively new book on online learning that covers all the basics and a lot of more recent research:

  • Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge University Press 2006.

The following book was our main source for kernel methods and Rademacher complexity, and has a lot more on those topics:

More traditional approaches to statistical learning theory can be found in

  • Luc Devroye, Lászlo Györfi and Gàbor Lugosi.A Probabilistic Theory of Pattern Recognition.. Springer 1996.
  • Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning. Springer 2001.

If you want more background on convex optimisation (one of the main ingredients of Support Vector Machines), the following modern textbook is available freely online:

Tutorials

Online learning, including generalisations of the Perceptron algorithm, but not the expert framework:

Summary of recent work in statistical learning theory, including Rademacher complexity:

There is also a nice introductory article on SVMs and related kernel methods:

Boosting was not covered in the course but is closely related:

 

Original research articles

The Weighted Majority algorithm is introduced and analysed in

The Aggregating Algorithm is due to Vovk. One of his articles on the topic is

The special case of absolute loss is covered in great detail (including tuning with static learning rate and with a very fancy doubling trick) by

  • Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire and Manfred K. Warmuth. How to use expert advice, Journal of the ACM 44(3):427–485, May 1997.

Self-confident tuning comes from

For multiplicative algorithms (not covered in the course) see for example

For conversion from online to batch algorithm, see

A nice proof for the connection between VC dimension and Rademacher complexity is given in

 

Journals, conferences and web sites