Introduction to Machine Learning, second edition

The MIT Press

February 2010:  ISBN-10: 0-262-01243-X, ISBN-13: 978-0-262-01243-0

The book can be ordered through The MIT Press, Amazon (CA, CN, DE, FR, JP, UK, US), Barnes&Noble (US), Pandora (TR).

·         PHI Learning Pvt. Ltd. (formerly Prentice-Hall of India) published an English language reprint for distribution in India, Bangladesh, Burma, Nepal, Sri Lanka, Bhutan, and Pakistan only. 

·         Yapay Öğrenme, the Turkish edition of the book (translated by the author) was published by Boğaziçi University Press in April 2011.

·         Chinese simplified character edition of the book will be published by China Machine Press/Huazhang Graphics & Information Co.

Table of Contents and Sample Chapters


Lecture Slides:

(For instructors to use in their courses; please keep the first page and footer if you edit the slides)


  1. Introduction (pdf,ppt)
  2. Supervised Learning (pdf, ppt)
  3. Bayesian Decision Theory (pdf, ppt)
  4. Parametric Methods (pdf, ppt)
  5. Multivariate Methods (pdf, ppt)
  6. Dimensionality Reduction (pdf, ppt)
  7. Clustering (pdf, ppt)
  8. Nonparametric Methods (pdf, ppt)
  9. Decision Trees (pdf, ppt)
  10. Linear Discrimination (pdf, ppt)
  11. Multilayer Perceptrons (pdf, ppt)
  12. Local Models (pdf, ppt)
  13. Kernel Machines (pdf, ppt)
  14. Bayesian Estimation (pdf, ppt)
  15. Hidden Markov Models (pdf, ppt)
  16. Graphical Models (pdf, ppt)
  17. Combining Multiple Learners (pdf, ppt)
  18. Reinforcement Learning (pdf, ppt)
  19. Design and Analysis of Machine Learning Experiments (pdf, ppt)


For Instructors: Select the "Online Instructor's Manual and Supplemental Content Download Request" link in the left menu of the book's web page,


The goal of machine learning is to program computers to use example data or past experience to solve a given problem. Many successful applications of machine learning exist already, including systems that analyze past sales data to predict customer behavior, optimize robot behavior so that a task can be completed using minimum resources, and extract knowledge from bioinformatics data. Introduction to Machine Learning is a comprehensive textbook on the subject, covering a broad array of topics not usually included in introductory machine learning texts. In order to present a unified treatment of machine learning problems and solutions, it discusses many methods from different fields, including statistics, pattern recognition, neural networks, artificial intelligence, signal processing, control, and data mining. All learning algorithms are explained so that the student can easily move from the equations in the book to a computer program.

The text covers such topics as supervised learning, Bayesian decision theory, parametric methods, multivariate methods, multilayer perceptrons, local models, hidden Markov models, assessing and comparing classification algorithms, and reinforcement learning. New to the second edition are chapters on kernel machines, graphical models, and Bayesian estimation; expanded coverage of statistical tests in a chapter on design and analysis of machine learning experiments; case studies available on the Web (with downloadable results for instructors); and many additional exercises. All chapters have been revised and updated.

Introduction to Machine Learning can be used by advanced undergraduates and graduate students who have completed courses in computer programming, probability, calculus, and linear algebra. It will also be of interest to engineers in the field who are concerned with the application of machine learning methods.


·         p. 41: Fourth line from the bottom of the page: “ic” should be “is” (Alexander Moriarty)

·         p. 66: Fourth line from the top of the page: “negligible” is misspelled (Bugra Akyildiz)

·         p. 124: Eq. 6.20; subscript of \epsilon should be j (Gi-Jeong Si)

·         p. 130: Below Eq. 6.37, while taking the derivative, 2 should be outside the parenthesis (Ali Çeliksu, Gi-Jeong Si).

·         p. 135: Eq. 6.47; in the final z, s should be a superscript and not a subscript (Gi-Jeong Si)

·         p. 194: Eq. 9.15: bm should be bmj (Gökhan Özbulak)

·         p. 224: Just above Eq. 10.30, after Mult, the subscript k should be uppercase K (Gi-Jeong Si)

·         p. 283: Around the middle of the page, it should be:  l not equal to j (Gi-Jeong Si)

·         p. 319: In the constraints below Eq. 13.19, it should be \sum_t \alpha_t \ge \nu (Rui Kuang)

·         p. 330: In the third line of the first equation, the + before (wTx + w0) should be – and the – before rt should be + (Mehmet Gönen, Gi-Jeong Si)

·         p. 330: In Eq. 13.50, the – sign before the last term (\sum_t r^t(\alpha^t+\alpha^t_-) ) should be a + (Yongwoon Cho)

·         p. 333: Just under Eq. 13.53, t of \gamma should be a superscript. (Gi-Jeong Si)

·         p. 336: Eq in the middle of the page; subscript of \lambda should be j (Gi-Jeong Si)

·         p. 343: Two lines before the bottom of the page, the subscript of the last q should be uppercase K (Gi-Jeong Si)

·         p. 348: Third eq on the page, the correct way to write is L(w|X); it is also better in the eq that follows to omit defining a separate term as L(r|X,w,\beta) but keep log p(r|X,w) (Gi-Jeong Si)

·         p. 348: Eq 14.11: The second term should read N\log\sqrt{\beta} (Orhan Özalp)

·         p. 352: 7th line from the top of the page, closing ] is missing after 1,0 (Gi-Jeong Si)

·         p. 356: First eq. p(x) should be p(w) (Murat Semerci, Gi-Jeong Si).

·         p. 378: Eq. 15.33: There should be a normalizing 1/P(Ok) factor after sum over k and before sum over t, while updating a and b values (Vicente Palazon).

·         P. 389: The very last eq on the bottom of the page; the prob is 0.48 and not 0.47 (Gökhan Özbulak)

·         p. 392: The first equation, the denominator of the second term; there should be no ~ (Gi-Jeong Si)

·         p. 405: Second line of Eq. 16.17: Index of summation should be Y in the second summation (Alex Kogan)

·         p. 492: Two lines below Eq. 18.20; the –  between rt+1 and \gammaV should be + (Murat Semerci, Gi-Jeong Si)

·         p. 500: The denominator should be divided by N (inside sqrt): \sqrt{p_0(1-p_0)/N} (Lisa Hellerstein)

I would like to thank everyone who took the time to find these errors and report them to me.

Created on Feb 11, 2010 by E. Alpaydin (my_last_name AT boun DOT edu DOT tr)