Supervised Machine Learning : Additional references
Textbooks
There are many general textbooks on machine learning. The one with a point of view closest to this course is
- Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification (2nd edition). Wiley 2001.
There is a relatively new book on online learning that covers all the basics and a lot of more recent research:
- Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge University Press 2006.
The following book was our main source for kernel methods and Rademacher complexity, and has a lot more on those topics:
- John Shawe-Taylor and Nello Cristianini: Kernel Methods for Pattern Analysis, Cambridge University Press 2004
More traditional approaches to statistical learning theory can be found in
- Luc Devroye, Lászlo Györfi and Gàbor Lugosi.A Probabilistic Theory of Pattern Recognition.. Springer 1996.
- Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning. Springer 2001.
If you want more background on convex optimisation (one of the main ingredients of Support Vector Machines), the following modern textbook is available freely online:
- Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press 2004
Tutorials
Online learning, including generalisations of the Perceptron algorithm, but not the expert framework:
- Jyrki Kivinen. Online learning of linear classifiers, In S. Mendelson and A. J. Smola, editors, Advanced Lectures on Machine Learning, pages 235–257, Springer LNCS 2600, January 2003
Summary of recent work in statistical learning theory, including Rademacher complexity:
- Shahar Mendelson. A few notes on statistical learning theory. In Advanced Lectures on Machine Learning, pages 1–40, Springer LNCS 2600, January 2003.
There is also a nice introductory article on SVMs and related kernel methods:
- Bernhard Schölkopf and Alexander J. Smola. A short introduction to learning with kernels. In Advanced Lectures on Machine Learning, pages 41–64, Springer LNCS 2600, January 2003.
Boosting was not covered in the course but is closely related:
- Ron Meir and Gunnar Rätsch. An Introduction to Boosting and Leveraging. In Advanced Lectures on Machine Learning, pages 118–183, Springer LNCS 2600, January 2003.
Original research articles
The Weighted Majority algorithm is introduced and analysed in
- N. Littlestone and M. K. Warmuth. The weighted majority algorithm, Information and Computation 108(2):212–261, February 1994.
The Aggregating Algorithm is due to Vovk. One of his articles on the topic is
- V. Vovk. A game of prediction with expert advice, Journal of Computer and System Sciences 56(2):153–173, April 1998.
The special case of absolute loss is covered in great detail (including tuning with static learning rate and with a very fancy doubling trick) by
- Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire and Manfred K. Warmuth. How to use expert advice, Journal of the ACM 44(3):427–485, May 1997.
Self-confident tuning comes from
- Peter Auer, Nicolò Cesa-Bianchi and Claudio Gentile. Adaptive and self-confident on-line learning algorithms, Journal of Computer and System Sciences 64(1):48–75, February 2002.
For multiplicative algorithms (not covered in the course) see for example
- Nick Littlestone. Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm, Machine Learning 2(4):285–318, April 1988.
- Nicolò Cesa-Bianchi. Analysis of two gradient-based algorithms for on-line regression, Journal of Computer and System Sciences 59(3):392–411, 1999.
For conversion from online to batch algorithm, see
- Nicolò Cesa-Bianchi, Alex Conconi and Claudio Gentile. On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory 50(9):2050-2057, September 2004.
A nice proof for the connection between VC dimension and Rademacher complexity is given in
- Matti Kääriäinen. Relating the Rademacher and VC bounds. University of Helsinki, Department of Computer Science, Report C-2004-57, 2004.
Journals, conferences and web sites
- Machine Learning
- Journal of Machine Learning Research
- Conference on Learning Theory (COLT, formerly Workshop on Computational Learning Theory), organised by Association for Computational Learning
- International Conference on Machine Learning (ICML)
- Neural Information Processing Systems (NIPS)
- Support Vector Machine homepage