如何在matplotlib中调整树形图的分支长度(如在astrodendro中)?[蟒蛇]

O.r*_*rka 20 python plot hierarchical-clustering matplotlib dendrogram

这是我在下面得到的图,但我希望它看起来像截断的树状图,astrodendro如下所示:

在此输入图像描述

还有一个从一个非常酷的树状图看本文,我想在重新创建matplotlib.

在此输入图像描述

下面是生成iris带有噪声变量的数据集并绘制树形图的代码matplotlib.

有谁知道如何:(1)截断分支,如示例图中; 和/或(2)使用astrodendro自定义链接矩阵和标签?

import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
import astrodendro
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial import distance

def iris_data(noise=None, palette="hls", desat=1):
    # Iris dataset
    X = pd.DataFrame(load_iris().data,
                     index = [*map(lambda x:f"iris_{x}", range(150))],
                     columns = [*map(lambda x: x.split(" (cm)")[0].replace(" ","_"), load_iris().feature_names)])

    y = pd.Series(load_iris().target,
                           index = X.index,
                           name = "Species")
    c = map_colors(y, mode=1, palette=palette, desat=desat)#y.map(lambda x:{0:"red",1:"green",2:"blue"}[x])

    if noise is not None:
        X_noise = pd.DataFrame(
            np.random.RandomState(0).normal(size=(X.shape[0], noise)),
            index=X_iris.index,
            columns=[*map(lambda x:f"noise_{x}", range(noise))]
        )
        X = pd.concat([X, X_noise], axis=1)
    return (X, y, c)

def dism2linkage(DF_dism, method="ward"):
    """
    Input: A (m x m) dissimalrity Pandas DataFrame object where the diagonal is 0
    Output: Hierarchical clustering encoded as a linkage matrix

    Further reading:
    http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.linkage.html
    https://pypi.python.org/pypi/fastcluster
    """
    #Linkage Matrix
    Ar_dist = distance.squareform(DF_dism.as_matrix())
    return linkage(Ar_dist,method=method)


# Get data
X_iris_with_noise, y_iris, c_iris = iris_data(50)
# Get distance matrix
df_dism = 1- X_iris_with_noise.corr().abs()
# Get linkage matrix
Z = dism2linkage(df_dism)

#Create dendrogram
with plt.style.context("seaborn-white"):
    fig, ax = plt.subplots(figsize=(13,3))
    D_dendro = dendrogram(
             Z, 
             labels=df_dism.index,
             color_threshold=3.5,
             count_sort = "ascending",
             #link_color_func=lambda k: colors[k]
             ax=ax
    )
    ax.set_ylabel("Distance")
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

mfi*_*tzp 1

我不确定这是否真的构成一个实际的答案,但它确实允许您生成带有截断悬挂线的树状图。诀窍是正常生成绘图,然后操作生成的 matplotlib 图来重新创建线条。

\n\n

我无法让您的示例在本地工作,因此我刚刚创建了一个虚拟数据集。

\n\n
from matplotlib import pyplot as plt\nfrom scipy.cluster.hierarchy import dendrogram, linkage\nimport numpy as np\n\na = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])\nb = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])\nX = np.concatenate((a, b),)\n\nZ = linkage(X, \'ward\')\n\nfig = plt.figure()\nax = fig.add_subplot(1,1,1)\n\ndendrogram(Z, ax=ax)\n
Run Code Online (Sandbox Code Playgroud)\n\n

生成的图是通常的长臂树状图。

\n\n

标准树状图图像,由随机数据生成

\n\n

现在来说更有趣的一点。树状图由许多对象组成LineCollection(每种颜色一个)。为了更新这些行,我们迭代这些行,提取有关其组成路径的详细信息,修改这些行以删除任何达到 ay为零的行,然后LineCollection为这些修改后的路径重新创建 a 。

\n\n

然后将更新的路径添加到轴中,并删除原始路径。

\n\n

一个棘手的部分是确定绘制到什么高度而不是零。由于我们正在迭代每个树状图路径,因此我们不知道哪个点出现在 \xe2\x80\x94\xc2\xa0 之前,我们基本上不知道我们在哪里。然而,我们可以利用悬挂线垂直悬挂的事实。假设在相同的 ,\xc2\xa0 上没有线x,我们可以查找y给定的已知其他值x,并在计算时将其用作新值的基础y。缺点是为了确保我们拥有这个数字,我们必须预先扫描数据。

\n\n

注意:如果您可以在同一个 上获得树状图悬挂线x,则需要包含y并搜索该 x 上方最近的 y来执行此操作。

\n\n
import numpy as np\nfrom matplotlib.path import Path\nfrom matplotlib.collections import LineCollection\n\nfig = plt.figure()\nax = fig.add_subplot(1,1,1)\n\ndendrogram(Z, ax=ax);\n\nfor c in ax.collections[:]: #\xc2\xa0use [:] to get a copy, since we\'re adding to the same list\n    paths = []\n    for path in c.get_paths():\n        segments = []\n        y_at_x = {}\n        # Pre-pass over all elements, to find the lowest y value at each x value.\n        # we can use this to caculate where to cut our lines.\n        for n, seg in enumerate(path.iter_segments()):\n            x, y = seg[0]\n            #\xc2\xa0Don\'t store if the y is zero, or if it\'s higher than the current low.\n            if y > 0 and y < y_at_x.get(x, np.inf):\n                y_at_x[x] = y\n\n        for n, seg in enumerate(path.iter_segments()):\n            x, y = seg[0]\n\n            if y == 0:\n                # If we know the last y at this x, use it - 0.5, limit > 0\n                y = max(0, y_at_x.get(x, 0) - 0.5)\n\n            segments.append([x,y])\n\n        paths.append(segments)\n\n    lc = LineCollection(paths, colors=c.get_colors())  #\xc2\xa0Recreate a LineCollection with the same params\n    ax.add_collection(lc)\n    ax.collections.remove(c) #\xc2\xa0Remove the original LineCollection\n
Run Code Online (Sandbox Code Playgroud)\n\n

生成的树状图如下所示:

\n\n

树状图悬挂

\n