标签: hierarchical-clustering

警告:python 中的未压缩距离矩阵

我尝试制作与凝聚层次聚类关联的树状图,并且我需要距离矩阵。我开始于:

import numpy as np 
import pandas as pd
from scipy import ndimage 
from scipy.cluster import hierarchy 
from scipy.spatial import distance_matrix 
from matplotlib import pyplot as plt 
from sklearn import manifold, datasets 
from sklearn.cluster import AgglomerativeClustering 
from sklearn.datasets.samples_generator import make_blobs 
%matplotlib inline
X1, y1 = make_blobs(n_samples=50, centers=[[4,4], [-2, -1], [1, 1], [10,4]], cluster_std=0.9)
plt.scatter(X1[:, 0], X1[:, 1], marker='o') 
agglom = AgglomerativeClustering(n_clusters = 4, linkage = 'average')
agglom.fit(X1,y1)
# Create a figure of size 6 inches by 4 inches.
plt.figure(figsize=(6,4))

# …
Run Code Online (Sandbox Code Playgroud)

python machine-learning hierarchical-clustering scikit-learn

4
推荐指数
1
解决办法
9150
查看次数

如何使用plotly绘制截断的树状图?

我想使用plotly 绘制层次聚类的树状图,并显示该图的一小部分子集,因为对于大量样本,该图在底部可能非常密集。

我使用绘图包装函数 create_dendrogram 和以下代码绘制了该图:

from scipy.cluster.hierarchy import linkage
import plotly.figure_factory as ff
fig = ff.create_dendrogram(test_df, linkagefun=lambda x: linkage(test_df, 'average', metric='euclidean'))
fig.update_layout(autosize=True, hovermode='closest')
fig.update_xaxes(mirror=False, showgrid=True, showline=False, showticklabels=False)
fig.update_yaxes(mirror=False, showgrid=True, showline=True)
fig.show()
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

下面是使用 matplotlib 绘制的图,scipy 库默认使用该图,为了便于解释,该图被截断为 4 个级别:

from scipy.cluster.hierarchy import dendrogram,linkage
x = linkage(test_df,method='average')
dendrogram(x,truncate_mode='level',p=4)
plt.show()
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

正如您所看到的,截断对于解释大量样本非常有用,我如何在绘图中实现这一点?

python hierarchical-clustering dendrogram plotly

4
推荐指数
1
解决办法
1649
查看次数

如何在R中打印hclust对象的行?

我使用R来聚类一个我称之为'tissuedata'的矩阵.我有一个使用以下代码生成的hclust对象:

    TissueDist<-dist(tissuedata, method="euclidean")
    TissueClust<-hclust(TissueDist, method='complete')
Run Code Online (Sandbox Code Playgroud)

现在我想打印出TissueClust中的行名,同时保留聚集行的顺序.有什么建议?

以下是'tissuedata'矩阵可以包含的示例:

                                          Brain      Bone    Breast      Lung Ovary  Pancreas      HeLa
    17271422_17271984_ENSG00000026025 -3.266758  0.000000 -3.215719 -5.248721     0 -2.891329 -3.718194
    17272608_17272709_ENSG00000026025 -4.304518 -4.560667 -3.359868  0.000000     0 -3.108627 -4.227678
    17272632_17272709_ENSG00000026025 -4.188425 -4.444906 -3.243362  0.000000     0 -2.992122 -4.111259
    17272649_17272709_ENSG00000026025 -3.984628 -4.338187 -3.104413  0.000000     0 -2.791452 -3.828157
    17275586_17275681_ENSG00000026025 -3.278478 -3.932706 -2.903414 -4.480172     0 -2.781268 -3.423038
    17276692_17276817_ENSG00000026025 -3.355184 -4.351640 -3.009279  0.000000     0 -3.231431 -4.194499
    17276692_17276850_ENSG00000026025 -3.456211 -4.453457 -3.110306  0.000000     0 -3.332458 -4.294992
    17277845_17277888_ENSG00000026025 -3.842749 -4.195861 -2.661506  0.000000     0 -2.373369 -3.436403
    17277845_17277908_ENSG00000026025 -4.005683 …
Run Code Online (Sandbox Code Playgroud)

r hierarchical-clustering hclust

3
推荐指数
1
解决办法
3111
查看次数

给定距离矩阵256x256的聚类

所以,我有256个物体,并计算了它们之间的距离矩阵(成对距离).我的距离矩阵的子集如下:

> dm[1:10, 1:10]
      V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
 [1,]  0  1  1  1  1  2  2  2  1   2
 [2,]  1  0  1  1  2  1  2  2  2   1
 [3,]  1  1  0  1  2  2  1  2  2   2
 [4,]  1  1  1  0  2  2  2  1  2   2
 [5,]  1  2  2  2  0  1  1  1  1   2
 [6,]  2  1  2  2  1  0  1  1  2   1 …
Run Code Online (Sandbox Code Playgroud)

r cluster-analysis hierarchical-clustering matrix hclust

3
推荐指数
1
解决办法
3625
查看次数

使用相似性函数来聚类scikit-learn

我使用函数来计算一对文档之间的相似性,并且想要使用这种相似性度量来执行聚类.
代码到目前为止

Sim=np.zeros((n, n)) # create a numpy arrary  
i=0  
j=0       
for i in range(0,n):      
   for j in range(i,n):  
    if i==j:  
        Sim[i][j]=1
     else:    
         Sim[i][j]=simfunction(list_doc[i],list_doc[j]) # calculate similarity between documents i and j using simfunction
Sim=Sim+ Sim.T - np.diag(Sim.diagonal()) # complete the symmetric matrix

AggClusterDistObj=AgglomerativeClustering(n_clusters=num_cluster,linkage='average',affinity="precomputed") 
Res_Labels=AggClusterDistObj.fit_predict(Sim)
Run Code Online (Sandbox Code Playgroud)

我担心的是,我在这里使用了相似度函数,我认为根据文档它应该是一个不相似矩阵,我怎样才能将它改为不相似矩阵.还有什么是更有效的方法来做到这一点.

python hierarchical-clustering scikit-learn

3
推荐指数
1
解决办法
2150
查看次数

如何绘制群集内的簇内平方和的图?

我有一个R的聚类图,而我想用wss图来优化聚类的"肘部标准",但我不知道如何绘制给定聚类的wss图,任何人都会帮助我?

这是我的数据:

Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096)
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1)
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029)
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067)
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188)
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108)
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046)
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088)
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025)
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1)
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146)
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442)
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717)

data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)
Run Code Online (Sandbox Code Playgroud)

这是我的聚类代码:

cor <- cor (data)
dist<-dist(cor)
hclust<-hclust(dist)
plot(hclust)
Run Code Online (Sandbox Code Playgroud)

运行上面的代码后我会得到一个树形图,而如何绘制这样的图:

在此输入图像描述

plot r cluster-analysis hierarchical-clustering

3
推荐指数
1
解决办法
3302
查看次数

如何计算聚类熵?一个工作示例或软件代码

我想计算这个示例方案的熵

http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html

在此输入图像描述

任何人都可以用真实的价值观逐步解释吗?我知道有很多公式,但我真的很难理解公式:)

例如,在给定的图像中,清楚且很好地解释了如何计算纯度

问题很清楚.我需要一个例子来计算这个聚类方案的熵.它可以是一步一步的解释.它可以是C#代码或Phyton代码来计算这样的方案

这里的熵公式

我将用C#编写代码

非常感谢您的帮助

在此输入图像描述

我需要这里给出的答案:https://stats.stackexchange.com/questions/95731/how-to-calculate-purity

c# math cluster-analysis hierarchical-clustering entropy

3
推荐指数
1
解决办法
7878
查看次数

解释SciPy的分层聚类树状图的输出?(也许发现了一个错误...)

我试图找出输出的scipy.cluster.hierarchy.dendrogram工作方式...我以为我知道它是如何工作的,并且能够使用输出来重建树状图,但是似乎我不再理解它,或者似乎有错误Python 3模块的版本。

这个答案,如何获得由scipy.cluster.hierarchy制作的树状图的子树,意味着dendrogram输出字典给出的dict_keys(['icoord', 'ivl', 'color_list', 'leaves', 'dcoord'])w /大小都相同,因此您可以将zip它们和plt.plot它们重建树状图。

看起来很简单,使用时确实可以恢复,Python 2.7.11但是一旦升级到Python 3.5.1旧脚本,就无法获得相同的结果。

我开始通过一个非常简单的可重复示例对集群进行返工,并认为我可能在Python 3.5.1版本的中发现了一个错误SciPy version 0.17.1-np110py35_1。要使用Scikit-learn数据集b / c,大多数人都从conda发行版中获得了该模块。

这些为什么不排成一列,为什么我不能以这种方式重建树状图?

# Init
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

# Load data
from sklearn.datasets import load_diabetes

# Clustering
from scipy.cluster.hierarchy import dendrogram, fcluster, leaves_list
from scipy.spatial import distance
from fastcluster …
Run Code Online (Sandbox Code Playgroud)

python machine-learning hierarchical-clustering dendrogram scipy

3
推荐指数
1
解决办法
2412
查看次数

获取从 Python 中的树状图形成的集群列表

我有一个单词列表,我对其执行了 TF-IDF 算法以获得前 100 个单词的列表。之后我应该执行聚类。现在我能够完成这两项任务(我正在共享代码和输入文件的相关部分,输出屏幕截图)。

我的查询是我想要在输出树状图中形成的集群列表,我该怎么做?Dendrogram 函数返回一个ax包含一些坐标和节点列表的元组。我如何操作它们以获取完整的集群列表。

以下是从输入文件中提取的内容。

"recommended stories dylan scott stat advertisement kate sheridan dylan scott dylan scott",
"email touting former representative mike fergusons genuine connection",
"email touting former representative mike ferguson \u2019",
"facebook donald trump fda hhs privacy policy",
"president trump appoints dr scott gottlieb",
"trade groups including novartis ag",
"bush alumni coalition supporting trump",
"online presidential transition analysis center",
"tennessee republican representative marsha blackburn",
"nonprofit global health care company",
"paula stannard ,\u201d …
Run Code Online (Sandbox Code Playgroud)

python hierarchical-clustering dendrogram

3
推荐指数
1
解决办法
3575
查看次数

使用相似矩阵的sklearn分层聚集聚类

给定一个距离矩阵,各个教授之间具有相似性:

              prof1     prof2     prof3
       prof1     0        0.8     0.9
       prof2     0.8      0       0.2
       prof3     0.9      0.2     0
Run Code Online (Sandbox Code Playgroud)

我需要对此数据执行分层聚类,其中上面的数据是二维矩阵的形式

       data_matrix=[[0,0.8,0.9],[0.8,0,0.2],[0.9,0.2,0]]
Run Code Online (Sandbox Code Playgroud)

我尝试检查是否可以使用sklearn.cluster AgglomerativeClustering来实现它,但它正在将所有3行视为3个单独的向量,而不是距离矩阵。可以使用this或scipy.cluster.hierarchy完成吗?

python hierarchical-clustering pandas scikit-learn

3
推荐指数
1
解决办法
3073
查看次数