Lloyd’s K-Means Clustering Algorithm#
K-Means is a non-probabilistic clustering algorithm that is used to group data points into a specified number of clusters. However, it can be treated as EM algorithm with a single Gaussian distribution.
Table of Contents#
References and Further Readings#
Books and Lectures#
Murphy, Kevin P. “Chapter 21.3. K-Means Clustering.” In Probabilistic Machine Learning: An Introduction. MIT Press, 2022.
Hal Daumé III. “Chapter 3.4. K-Means Clustering.” In A Course in Machine Learning, January 2017.
Hal Daumé III. “Chapter 15.1. K-Means Clustering.” In A Course in Machine Learning, January 2017.
Bishop, Christopher M. “Chapter 9.1. K-Means Clustering.” In Pattern Recognition and Machine Learning. New York: Springer-Verlag, 2016.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. “Chapter 12.4.1. K-Means Clustering.” In An Introduction to Statistical Learning: With Applications in R. Boston: Springer, 2022.
Hastie, Trevor, Tibshirani, Robert and Friedman, Jerome. “Chapter 14.3. Cluster Analysis.” In The Elements of Statistical Learning. New York, NY, USA: Springer New York Inc., 2001.
Raschka, Sebastian. “Chapter 10.1. Grouping objects by similarity using k-means.” In Machine Learning with PyTorch and Scikit-Learn.
Jung, Alexander. “Chapter 8.1. Hard Clustering with K-Means.” In Machine Learning: The Basics. Singapore: Springer Nature Singapore, 2023.
Vincent, Tan. “Lecture 17a.” In MA4270 Data Modelling and Computation.
Notebooks#
Online Resources#
Assumptions of K-Means:
Interview Questions:
Proof of K-Means Converges in Finite Steps