web*_*ker 3 python cluster-analysis k-means scikit-learn
如果我已经有一个可以作为初始质心的numpy数组,我该如何正确初始化kmeans算法?我正在使用scikit-learn Kmeans课程
这篇文章(k-means with selected initial centers)表示如果我使用numpy数组作为初始质心,我只需要设置n_init = 1,但我不确定我的初始化是否正常工作
Naftali Harris的优秀可视化页面显示了我想要做的事情 http://www.naftaliharris.com/blog/visualizing-k-means-clustering/
"我会选择" - >"Packed Circles" - >运行kmeans
#numpy array of initial centroids
startpts=np.array([[-0.12, 0.939, 0.321, 0.011], [0.0, 0.874, -0.486, 0.862], [0.0, 1.0, 0.0, 0.033], [0.12, 0.939, 0.321, -0.7], [0.0, 1.0, 0.0, -0.203], [0.12, 0.939, -0.321, 0.25], [0.0, 0.874, 0.486, -0.575], [-0.12, 0.939, -0.321, 0.961]], np.float64)
centroids= sk.KMeans(n_clusters=8, init=startpts, n_init=1)
centroids.fit(actual_data_points)
#get the array
centroids_array=centroids.cluster_centers_
Run Code Online (Sandbox Code Playgroud)
是的,设置初始质心init应该工作.这是scikit-learn 文档的引用:
init : {‘k-means++’, ‘random’ or an ndarray}
Method for initialization, defaults to ‘k-means++’:
If an ndarray is passed, it should be of shape (n_clusters, n_features)
and gives the initial centers.
Run Code Online (Sandbox Code Playgroud)
形状
(n_clusters, n_features)是什么意思?
形状要求意味着init必须具有精确的n_clusters行,并且每行中的元素数量应与以下维度相匹配actual_data_points:
>>> init = np.array([[-0.12, 0.939, 0.321, 0.011],
[0.0, 0.874, -0.486, 0.862],
[0.0, 1.0, 0.0, 0.033],
[0.12, 0.939, 0.321, -0.7],
[0.0, 1.0, 0.0, -0.203],
[0.12, 0.939, -0.321, 0.25],
[0.0, 0.874, 0.486, -0.575],
[-0.12, 0.939, -0.321, 0.961]],
np.float64)
>>> init.shape[0] == 8
True # n_clusters
>>> init.shape[1] == actual_data_points.shape[1]
True # n_features
Run Code Online (Sandbox Code Playgroud)
什么是n_features?
n_features是您的样本的维度.例如,如果要在2D平面上聚类点,则为n_features2.