CSL 864 - Special Topics in AI: Classification

This is an introductory course on machine learning focusing on classification. The course has three major objectives. First, to familiarize students with basic classification methods so that these can be used as tools to tackle practical machine learning problems. Second, to equip students with the mathematical skills needed to theoretically analyze these methods and modify and extend them to tackle new problems. Finally, students will be introduced to basic optimization techniques so that they can start writing their own code for these methods.

Lecture 1: Introduction to supervised learning, over fitting, probability theory and decision theory.
Lecture 2: Generative Methods and Naïve Bayes.
Lecture 3: Toy example.
Lecture 4: Discriminative Methods and Logistic Regression. Equivalence to Naïve Bayes.
Lecture 5: Logistic Regression optimization and extensions.
Lecture 6: Support Vector Machines.
Lecture 7: SVMs continued - Kernels.
Lecture 8: Multi-Class SVMs. Digression on VC dimension.
Lecture 9: SVM optimization
Lecture 10: Kernel learning and optimization
Lecture 11: Boosting
Lecture 12: Boosting continued

Slides

Manik's slides
Manik's Multiple Kernel Learning Tutorial
VC Dimension (slides by Andrew Moore)

Code

Most code was written five minutes before the start of each lecture and comes with no guarantees, comments or documentation. In particular, no attempt has been made to bulletproof the code. For instance, if there is no feasible solution for your parameter settings then the figure plotting subroutines will crash (the LR and SVM learning routines should be stable). In any case, run the code at your own peril.

Some common MATLAB tools needed for the demos (you'll need the Optimization Toolbox ver 4.2 or higher for fmincon.LBFGS for Logistic Regression)
MATLAB code for demos
- Naïve Bayes
- Regularized Logistic Regression (code is kernelized and displays dual weights)
- Naïve Bayes vs Logistic Regression
- Multi-class Logistic Regression (Multinomial, 1-vs-All, 1-vs-1 DAG, 1-vs-1 majority vote)
- Linear SVMs vs Logistic Regression
- Non-linear SVMs
- Multi-class SVMs (multi-class hinge loss, Multinomial Logistic Regression, 1-vs-All SVM, 1-vs-1 DAG SVM, 1-vs-1 majority vote SVM)

Links to code by other people can be found in the slides.

Recommended Reading

D. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000.
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley and Sons, second edition, 2001.
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, second edition, 2009.
T. Mitchell. Machine Learning. McGraw Hill, 1997.
B. Scholkopf and A. Smola. Learning with Kernels. MIT Press, 2002.

Please see the slides for links to relevant research papers.

Back to Manik's Home Page