无法让scipy层次聚类工作

Question

无法让scipy层次聚类工作

moo*_*eep 7 python cluster-analysis hierarchical-clustering scipy

我写了一个简单的脚本,旨在对一个简单的测试数据集进行层次聚类. 使用的测试数据.

我发现函数fclusterdata是将我的数据聚类成两个集群的候选者.它需要两个强制调用参数:数据集和阈值.问题是,我找不到可以产生预期的两个集群的阈值.

如果有人能告诉我我做错了什么,我会很高兴的.如果有人能指出更适合我的聚类的其他方法,我也会很高兴(我明确地希望避免事先指定聚类的数量.)

这是我的代码:

import time
import scipy.cluster.hierarchy as hcluster
import numpy.random as random
import numpy

import pylab
pylab.ion()

data = random.randn(2,200)

data[:100,:100] += 10

for i in range(5,15):
    thresh = i/10.
    clusters = hcluster.fclusterdata(numpy.transpose(data), thresh)
    pylab.scatter(*data[:,:], c=clusters)
    pylab.axis("equal")
    title = "threshold: %f, number of clusters: %d" % (thresh, len(set(clusters)))
    print title
    pylab.title(title)
    pylab.draw()
    time.sleep(0.5)
    pylab.clf()

Run Code Online (Sandbox Code Playgroud)

这是输出:

threshold: 0.500000, number of clusters: 129
threshold: 0.600000, number of clusters: 129
threshold: 0.700000, number of clusters: 129
threshold: 0.800000, number of clusters: 75
threshold: 0.900000, number of clusters: 75
threshold: 1.000000, number of clusters: 73
threshold: 1.100000, number of clusters: 58
threshold: 1.200000, number of clusters: 1
threshold: 1.300000, number of clusters: 1
threshold: 1.400000, number of clusters: 1

Run Code Online (Sandbox Code Playgroud)

Answer 1

Die*_*ego 6

请注意,函数引用有错误.t参数的正确定义是:"簇功能的截止阈值或最大簇数(标准='maxclust')".

试试这个:

clusters = hcluster.fclusterdata(numpy.transpose(data), 2, criterion='maxclust', metric='euclidean', depth=1, method='centroid')

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，9 月前
查看次数：	4765 次
最近记录：	13 年，9 月前