Clustering: K-Means vs DBSCAN

Method name Parameters Scalability Usecase Geometry (metric used)
K-Means number of clusters Very large n_samples, medium n_clusters with MiniBatch code General-purpose, even cluster size, flat geometry, not too many clusters Distances between points
Affinity propagation damping, sample preference Not scalable with n_samples Many clusters, uneven cluster size, non-flat geometry Graph distance (e.g. nearest-neighbor graph)
Mean-shift bandwidth Not scalable with n_samples Many clusters, uneven cluster size, non-flat geometry Distances between points
Spectral clustering number of clusters Medium n_samples, small n_clusters Few clusters, even cluster size, non-flat geometry Graph distance (e.g. nearest-neighbor graph)
Hierarchical clustering number of clusters Large n_samples and n_clusters Many clusters, possibly connectivity constraints Distances between points
DBSCAN neighborhood size Very large n_samples, medium n_clusters Non-flat geometry, uneven cluster sizes Distances between nearest points
Gaussian mixtures many Not scalable Flat geometry, good for density estimation Mahalanobis distances to centers

via 4.3. Clustering — scikit-learn 0.13.1 documentation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s