O.r*_*rka 20 python plot hierarchical-clustering matplotlib dendrogram
这是我在下面得到的图,但我希望它看起来像截断的树状图,astrodendro如下所示:
还有一个从一个非常酷的树状图看本文,我想在重新创建matplotlib.
下面是生成iris带有噪声变量的数据集并绘制树形图的代码matplotlib.
有谁知道如何:(1)截断分支,如示例图中; 和/或(2)使用astrodendro自定义链接矩阵和标签?
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
import astrodendro
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial import distance
def iris_data(noise=None, palette="hls", desat=1):
# Iris dataset
X = pd.DataFrame(load_iris().data,
index = [*map(lambda x:f"iris_{x}", range(150))],
columns = [*map(lambda x: x.split(" (cm)")[0].replace(" ","_"), load_iris().feature_names)])
y = pd.Series(load_iris().target,
index = X.index,
name = "Species")
c = map_colors(y, mode=1, palette=palette, desat=desat)#y.map(lambda x:{0:"red",1:"green",2:"blue"}[x])
if noise is not None:
X_noise = pd.DataFrame(
np.random.RandomState(0).normal(size=(X.shape[0], noise)),
index=X_iris.index,
columns=[*map(lambda x:f"noise_{x}", range(noise))]
)
X = pd.concat([X, X_noise], axis=1)
return (X, y, c)
def dism2linkage(DF_dism, method="ward"):
"""
Input: A (m x m) dissimalrity Pandas DataFrame object where the diagonal is 0
Output: Hierarchical clustering encoded as a linkage matrix
Further reading:
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.linkage.html
https://pypi.python.org/pypi/fastcluster
"""
#Linkage Matrix
Ar_dist = distance.squareform(DF_dism.as_matrix())
return linkage(Ar_dist,method=method)
# Get data
X_iris_with_noise, y_iris, c_iris = iris_data(50)
# Get distance matrix
df_dism = 1- X_iris_with_noise.corr().abs()
# Get linkage matrix
Z = dism2linkage(df_dism)
#Create dendrogram
with plt.style.context("seaborn-white"):
fig, ax = plt.subplots(figsize=(13,3))
D_dendro = dendrogram(
Z,
labels=df_dism.index,
color_threshold=3.5,
count_sort = "ascending",
#link_color_func=lambda k: colors[k]
ax=ax
)
ax.set_ylabel("Distance")
Run Code Online (Sandbox Code Playgroud)
我不确定这是否真的构成一个实际的答案,但它确实允许您生成带有截断悬挂线的树状图。诀窍是正常生成绘图,然后操作生成的 matplotlib 图来重新创建线条。
\n\n我无法让您的示例在本地工作,因此我刚刚创建了一个虚拟数据集。
\n\nfrom matplotlib import pyplot as plt\nfrom scipy.cluster.hierarchy import dendrogram, linkage\nimport numpy as np\n\na = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])\nb = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])\nX = np.concatenate((a, b),)\n\nZ = linkage(X, \'ward\')\n\nfig = plt.figure()\nax = fig.add_subplot(1,1,1)\n\ndendrogram(Z, ax=ax)\nRun Code Online (Sandbox Code Playgroud)\n\n生成的图是通常的长臂树状图。
\n\n\n\n现在来说更有趣的一点。树状图由许多对象组成LineCollection(每种颜色一个)。为了更新这些行,我们迭代这些行,提取有关其组成路径的详细信息,修改这些行以删除任何达到 ay为零的行,然后LineCollection为这些修改后的路径重新创建 a 。
然后将更新的路径添加到轴中,并删除原始路径。
\n\n一个棘手的部分是确定绘制到什么高度而不是零。由于我们正在迭代每个树状图路径,因此我们不知道哪个点出现在 \xe2\x80\x94\xc2\xa0 之前,我们基本上不知道我们在哪里。然而,我们可以利用悬挂线垂直悬挂的事实。假设在相同的 ,\xc2\xa0 上没有线x,我们可以查找y给定的已知其他值x,并在计算时将其用作新值的基础y。缺点是为了确保我们拥有这个数字,我们必须预先扫描数据。
注意:如果您可以在同一个 上获得树状图悬挂线x,则需要包含y并搜索该 x 上方最近的 y来执行此操作。
import numpy as np\nfrom matplotlib.path import Path\nfrom matplotlib.collections import LineCollection\n\nfig = plt.figure()\nax = fig.add_subplot(1,1,1)\n\ndendrogram(Z, ax=ax);\n\nfor c in ax.collections[:]: #\xc2\xa0use [:] to get a copy, since we\'re adding to the same list\n paths = []\n for path in c.get_paths():\n segments = []\n y_at_x = {}\n # Pre-pass over all elements, to find the lowest y value at each x value.\n # we can use this to caculate where to cut our lines.\n for n, seg in enumerate(path.iter_segments()):\n x, y = seg[0]\n #\xc2\xa0Don\'t store if the y is zero, or if it\'s higher than the current low.\n if y > 0 and y < y_at_x.get(x, np.inf):\n y_at_x[x] = y\n\n for n, seg in enumerate(path.iter_segments()):\n x, y = seg[0]\n\n if y == 0:\n # If we know the last y at this x, use it - 0.5, limit > 0\n y = max(0, y_at_x.get(x, 0) - 0.5)\n\n segments.append([x,y])\n\n paths.append(segments)\n\n lc = LineCollection(paths, colors=c.get_colors()) #\xc2\xa0Recreate a LineCollection with the same params\n ax.add_collection(lc)\n ax.collections.remove(c) #\xc2\xa0Remove the original LineCollection\nRun Code Online (Sandbox Code Playgroud)\n\n生成的树状图如下所示:
\n\n\n