我是新手sklearn(和python一般)但需要处理一些涉及超过10k样本聚类的项目.使用以下代码,测试数据集少于100个样本且k = 4,聚类按预期进行.但是,当我开始使用超过100个样本时,6/8质心似乎在原点(0,0)处重复,即它无法生成群集.对于可能出错的事情的任何建议?
码:
data = pd.read_csv('parsed.txt', sep="\t", header=None)
data.columns = ["x", "y"]
kmeans = KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=1000,
n_clusters=k, n_init=10, n_jobs=1, precompute_distances='auto',
random_state=None, tol=0.0001, verbose=0)
kmeans.fit(data)
labels = kmeans.predict(data)
centroids = kmeans.cluster_centers_
fig = plot.figure(figsize=(5, 5))
colmap = {(x+1): [(np.sin(0.3*x + 0)*127+128)/255,(np.sin(0.3*x + 2)*127+128)/255,(np.sin(0.3*x + 4)*127+128)/255] for x in range(k)} # making rainbow colormap
colors = map(lambda x: colmap[x+1], labels) #color for each label
plot.scatter(data['x'], data['y'], color=colors, …Run Code Online (Sandbox Code Playgroud)