Max*_*ell 8 python cluster-analysis k-means
如何在python中对kmeans聚类的绘图输出?我正在使用PyCluster包.allUserVector是一个n×m的向量,基本上是n个具有m个特征的用户.
import Pycluster as pc
import numpy as np
clusterid,error,nfound = pc.kcluster(allUserVector, nclusters=3, transpose=0,npass=1,method='a',dist='e')
clustermap, _, _ = pc.kcluster( allUserVector, nclusters=3, transpose=0,npass=1,method='a',dist='e', )
centroids, _ = pc.clustercentroids( allUserVector, clusterid=clustermap )
print centroids
print clusterid
print nfound
Run Code Online (Sandbox Code Playgroud)
我想在图表中很好地打印集群,这些图表清楚地显示了哪些用户在哪个集群中.每个用户都是维度向量任何输入?
Dou*_*gal 15
绘制m二维数据很难.一种方法是通过主成分分析(PCA)映射到二维空间.一旦我们完成了这个,我们就可以将它们放到带有matplotlib的图上(基于这个答案).
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import mlab
import Pycluster as pc
# make fake user data
users = np.random.normal(0, 10, (20, 5))
# cluster
clusterid, error, nfound = pc.kcluster(users, nclusters=3, transpose=0,
npass=10, method='a', dist='e')
centroids, _ = pc.clustercentroids(users, clusterid=clusterid)
# reduce dimensionality
users_pca = mlab.PCA(users)
cutoff = users_pca.fracs[1]
users_2d = users_pca.project(users, minfrac=cutoff)
centroids_2d = users_pca.project(centroids, minfrac=cutoff)
# make a plot
colors = ['red', 'green', 'blue']
plt.figure()
plt.xlim([users_2d[:,0].min() - .5, users_2d[:,0].max() + .5])
plt.ylim([users_2d[:,1].min() - .5, users_2d[:,1].max() + .5])
plt.xticks([], []); plt.yticks([], []) # numbers aren't meaningful
# show the centroids
plt.scatter(centroids_2d[:,0], centroids_2d[:,1], marker='o', c=colors, s=100)
# show user numbers, colored by their cluster id
for i, ((x,y), kls) in enumerate(zip(users_2d, clusterid)):
plt.annotate(str(i), xy=(x,y), xytext=(0,0), textcoords='offset points',
color=colors[kls])
Run Code Online (Sandbox Code Playgroud)

如果你想绘制数字以外的东西,只需将第一个参数改为annotate.例如,您可以使用用户名或其他内容.
请注意,群集在此空间中看起来可能略微"错误"(例如,15似乎更接近红色而不是下面的绿色),因为它不是发生聚类的实际空间.在这种情况下,前两个主要组件保留了61%的方差:
>>> np.cumsum(users_pca.fracs)
array([ 0.36920636, 0.61313708, 0.81661401, 0.95360623, 1. ])
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8152 次 |
| 最近记录: |