Method name Parameters Scalability Usecase Geometry (metric used) K-Meansnumber of clusters Very large n_samples, medium n_clusters with MiniBatch codeGeneral-purpose, even cluster size, flat geometry, not too many clusters Distances between points Affinity propagationdamping, sample preference Not scalable with n_samples Many clusters, uneven cluster size, non-flat geometry Graph distance (e.g. nearest-neighbor graph) Mean-shiftbandwidth Not scalable with n_samples Many clusters, uneven cluster size, non-flat geometry Distances between points Spectral clusteringnumber of clusters Medium n_samples, small n_clusters Few clusters, even cluster size, non-flat geometry Graph distance (e.g. nearest-neighbor graph) Hierarchical clusteringnumber of clusters Large n_samples and n_clusters Many clusters, possibly connectivity constraints Distances between points DBSCANneighborhood size Very large n_samples, medium n_clusters Non-flat geometry, uneven cluster sizes Distances between nearest points Gaussian mixturesmany Not scalable Flat geometry, good for density estimation Mahalanobis distances to centers

via 4.3. Clustering — scikit-learn 0.13.1 documentation.

Advertisements