Method name |
Parameters |
Scalability |
Usecase |
Geometry (metric used) |

*K-Means* |
number of clusters |
Very large n_samples, medium n_clusters with *MiniBatch code* |
General-purpose, even cluster size, flat geometry, not too many clusters |
Distances between points |

*Affinity propagation* |
damping, sample preference |
Not scalable with n_samples |
Many clusters, uneven cluster size, non-flat geometry |
Graph distance (e.g. nearest-neighbor graph) |

*Mean-shift* |
bandwidth |
Not scalable with n_samples |
Many clusters, uneven cluster size, non-flat geometry |
Distances between points |

*Spectral clustering* |
number of clusters |
Medium n_samples, small n_clusters |
Few clusters, even cluster size, non-flat geometry |
Graph distance (e.g. nearest-neighbor graph) |

*Hierarchical clustering* |
number of clusters |
Large n_samples and n_clusters |
Many clusters, possibly connectivity constraints |
Distances between points |

*DBSCAN* |
neighborhood size |
Very large n_samples, medium n_clusters |
Non-flat geometry, uneven cluster sizes |
Distances between nearest points |

*Gaussian mixtures* |
many |
Not scalable |
Flat geometry, good for density estimation |
Mahalanobis distances to centers |

via 4.3. Clustering — scikit-learn 0.13.1 documentation.