Errata to the 1st printing (10 9 8 7 6 5 4 3 2 1 in ISBN number page) of:
Ethem Alpaydin: "Introduction to Machine Learning", 3rd ed.
MIT Press, 2014.
Miguel A. Carreira-Perpinan, 2015.
General comments:
- Many figures in the book have the wrong aspect ratio, and in some of them
this affects the comprehension of the material. For example, fig. 7.2
about k-means is too stretched vertically, so that the distances are
distorted and some points appear to be assigned to the wrong cluster.
In fig. 6.12, the Opdigits after LDA is considerably distorted (should be
about twice as wide). Likewise fig. 6.16 and others.
- P. 22 eq. 2.3: x^t,r^t -> (x^t,r^t) (ie, a set of ordered pairs).
- P. 36 eq. 2.17: there should be some space between \bar{x} and \bar{r},
otherwise it looks like \bar{x r} (ie, the average of the products xt*rt).
- P. 81 l. -8: E[g(x)] -> E_X[g(x)]
- P. 128 l. 1: the covariance is X'.X/N.
- P. 136: "their their".
- P. 175 l. 16: "that is inversely proportional to the distance" strictly
means 1/d where d is the distance. It should say "that is a decreasing
function of the distance".
- P. 176 l. 13: "The graph should always be connected". A disconnected graph
still works, in that each connected component contains one (or more
clusters). Generally, one should run a connected-components algorithm and
then apply spectral clustering to each component.
- P. 177 first eq.: (xrj - zsj)^p -> |xrj - zsj|^p (absolute value).
- P. 177 l. -11: "constructing the minimal spanning tree of the graph" using
Kruskal's algorithm.
- P. 199 l. 14: "seperating".
- P. 220 l. 14: the "calligraphic I" symbol (impurity?) has not been defined.
- P. 235, exe. 1: the Gini index should be multiplied by 2 to be consistent
with eq. (9.5).
- P. 248 l. 1: should be log(y/(1-y)) (with extra parenthesis).
- P. 261 eq. (10.48): the [ ]+ operator should apply to w'.(xv-xu), not to
xv-xu.
- P. 267: it is odd to consider MLPs as nonparametric methods.
- P. 310: "lingustics".
- P. 315: "Immenent".
- P. 326: "topogrophical".
- P. 350 point 5: we cannot solve analytically for the optimum; a QP requires
an iterative algorithm. Heuristics for learning rates, etc. are less
crucial than for other nonlinear models, but still important depending on
the QP optimization algorithm.
- P. 352 line after eq. (13.3): "this is a standard quadratic programming
problem". Also p. 353 l. 10: "quadratic programming methods".
- P. 355: "sectiona".
- Chapter 13: it would be more clear to show explicitly what variables are
optimized over, eg in eq. (13.17) to write min_{w,w0,\rho,\xi_1...\xi_N},
to differentiate it from variables such as \vu or N that are fixed and not
optimized over.
- P. 370, l. -1: "we have r^t = w^T ..." (w instead of x).
- P. 497 l. -10: "a column of all 0s (or 1s)" -> of all -1s (or +1s). Also in
"0101" and "1010".
- P. 557: "reproducable".