O.r*_*rka 21 python hierarchical-clustering matplotlib heatmap seaborn
通常当我做树形图和热图时,我使用距离矩阵并做一堆SciPy东西.我想尝试Seaborn但Seaborn想要我的数据是矩形的(行=样本,cols =属性,而不是距离矩阵)?
我本质上想seaborn用作后端来计算我的树形图并将其粘贴到我的热图上.这可能吗?如果没有,这可能是未来的特色.
也许有我可以调整的参数,所以它可以采用距离矩阵而不是矩形矩阵?
这是用法:
seaborn.clustermap¶
seaborn.clustermap(data, pivot_kws=None, method='average', metric='euclidean',
z_score=None, standard_scale=None, figsize=None, cbar_kws=None, row_cluster=True,
col_cluster=True, row_linkage=None, col_linkage=None, row_colors=None,
col_colors=None, mask=None, **kwargs)
Run Code Online (Sandbox Code Playgroud)
我的代码如下:
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target
DF = pd.DataFrame(X, index = ["iris_%d" % (i) for i in range(X.shape[0])], columns = iris.feature_names)
Run Code Online (Sandbox Code Playgroud)
我不认为我的方法在下面是正确的,因为我给它一个预先计算的距离矩阵,而不是它要求的矩形数据矩阵.没有关于如何使用相关/距离矩阵的示例,clustermap但有https://stanford.edu/~mwaskom/software/seaborn/examples/network_correlations.html,但排序没有与普通sns.heatmap功能集群.
DF_corr = DF.T.corr()
DF_dism = 1 - DF_corr
sns.clustermap(DF_dism)
Run Code Online (Sandbox Code Playgroud)
Ulr*_*ern 22
您可以将预先计算的距离矩阵作为链接传递给clustermap():
import pandas as pd, seaborn as sns
import scipy.spatial as sp, scipy.cluster.hierarchy as hc
from sklearn.datasets import load_iris
sns.set(font="monospace")
iris = load_iris()
X, y = iris.data, iris.target
DF = pd.DataFrame(X, index = ["iris_%d" % (i) for i in range(X.shape[0])], columns = iris.feature_names)
DF_corr = DF.T.corr()
DF_dism = 1 - DF_corr # distance matrix
linkage = hc.linkage(sp.distance.squareform(DF_dism), method='average')
sns.clustermap(DF_dism, row_linkage=linkage, col_linkage=linkage)
Run Code Online (Sandbox Code Playgroud)
对于clustermap(distance_matrix)(即没有链接通过),链接是根据距离矩阵中行和列的成对距离在内部计算的(请参阅下面的注释以获取完整的详细信息),而不是直接使用距离矩阵的元素(正确的解决方案) .结果,输出与问题中的输出略有不同:

注意:如果没有row_linkage传递给clustermap(),则通过将每一行视为"点"(观察)并计算点之间的成对距离来在内部确定行链接.因此行树形图反映了行相似性.类似于col_linkage,每列被视为一个点.这种解释应该可以添加到文档中.这里修改了文档的第一个示例,以使内部链接计算显式化:
import seaborn as sns; sns.set()
import scipy.spatial as sp, scipy.cluster.hierarchy as hc
flights = sns.load_dataset("flights")
flights = flights.pivot("month", "year", "passengers")
row_linkage, col_linkage = (hc.linkage(sp.distance.pdist(x), method='average')
for x in (flights.values, flights.values.T))
g = sns.clustermap(flights, row_linkage=row_linkage, col_linkage=col_linkage)
# note: this produces the same plot as "sns.clustermap(flights)", where
# clustermap() calculates the row and column linkages internally
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6442 次 |
| 最近记录: |