如何使用plotly绘制截断的树状图?

Alb*_*bin 4 python hierarchical-clustering dendrogram plotly

我想使用plotly 绘制层次聚类的树状图,并显示该图的一小部分子集,因为对于大量样本,该图在底部可能非常密集。

我使用绘图包装函数 create_dendrogram 和以下代码绘制了该图:

from scipy.cluster.hierarchy import linkage
import plotly.figure_factory as ff
fig = ff.create_dendrogram(test_df, linkagefun=lambda x: linkage(test_df, 'average', metric='euclidean'))
fig.update_layout(autosize=True, hovermode='closest')
fig.update_xaxes(mirror=False, showgrid=True, showline=False, showticklabels=False)
fig.update_yaxes(mirror=False, showgrid=True, showline=True)
fig.show()
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

下面是使用 matplotlib 绘制的图,scipy 库默认使用该图,为了便于解释,该图被截断为 4 个级别:

from scipy.cluster.hierarchy import dendrogram,linkage
x = linkage(test_df,method='average')
dendrogram(x,truncate_mode='level',p=4)
plt.show()
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

正如您所看到的,截断对于解释大量样本非常有用,我如何在绘图中实现这一点?

ves*_*and 5

似乎没有一种直接的方法可以做到这一点ff.create_dendrogram()。但这并不意味着这是不可能的。但我至少会考虑Dash Clustergram提供的出色功能。如果你坚持坚持使用ff.create_dendrogram(),这将会变得比 Plotly 用户已经习惯的更加混乱。您还没有提供数据样本,所以让我们使用 PlotlyBasic Dendrogram示例来代替:

\n

地块1

\n

在此输入图像描述

\n

代码1

\n
import plotly.figure_factory as ff\nimport numpy as np\nnp.random.seed(1)\n\nX = np.random.rand(15, 12) # 15 samples, with 12 dimensions each\nfig = ff.create_dendrogram(X)\nfig.update_layout(width=800, height=500)\nf = fig.full_figure_for_development(warn=False)\nfig.show()\n
Run Code Online (Sandbox Code Playgroud)\n

好消息是,在我们采取了一些步骤之后,完全相同的片段将产生以下截断的图,我将在下面的详细信息中解释这些步骤

\n

地块2

\n

在此输入图像描述

\n

细节

\n

如果有人在我的回答中知道了更好的方法来执行以下操作,那么分享。

\n

1.ff.create_dendrogram()是一个包装器scipy.cluster.hierarchy.dendrogram

\n

您可以致电help(ff.create_dendrogram)了解:

\n
\n

[...]这是 scipy.cluster.hierarchy.dendrogram 的薄包装。

\n
\n

从可用的参数中,您还可以看到似乎没有一个参数可以处理与截断相关的任何内容:

\n
\n

create_dendrogram(X,方向=\'底部\',标签=无,\ncolorscale=无,distfun=无,linkagefun=<函数位于\n0x0000016F09D4CEE0>,hovertext=无,color_threshold=无)

\n
\n

2. 仔细看看scipy.cluster.hierarchy.dendrogram

\n

ff.create_dendrogram(X)在这里,我们可以看到,当我们将其与代码进行比较时,在实现该函数后,一些核心元素被遗漏了:

\n
scipy.cluster.hierarchy.dendrogram(Z, p=30, truncate_mode=None, color_threshold=None, get_leaves=True, orientation=\'top\', labels=None, count_sort=False, distance_sort=False, show_leaf_counts=True, no_plot=False, no_labels=False, leaf_font_size=None, leaf_rotation=None, leaf_label_func=None, show_contracted=False, link_color_func=None, ax=None, above_threshold_color=\'C0\')\n
Run Code Online (Sandbox Code Playgroud)\n

truncate_mode应该正是我们正在寻找的。所以,现在我们知道这scipy可能已经具备了构建截断树状图基础所需的一切,但下一步是什么?

\n

3.找到scipy.cluster.hierarchy.dendrogram藏身之处ff.create_dendrogram(X)

\n

ff.create_dendrogram.__code__将显示源代码在您的系统中的位置。就我而言,这是:

\n
"C:\\Users\\vestland\\Miniconda3\\envs\\dashy\\lib\\site-packages\\plotly\\figure_factory\\_dendrogram.py"\n
Run Code Online (Sandbox Code Playgroud)\n

因此,如果您愿意,可以仔细查看相应文件夹中的完整源代码。如果您这样做,您将看到一个特别有趣的部分,其中处理了我们上面列出的一些属性:

\n
def get_dendrogram_traces(\n    self, X, colorscale, distfun, linkagefun, hovertext, color_threshold\n):\n    """\n    Calculates all the elements needed for plotting a dendrogram.\n.\n.\n.\nP = sch.dendrogram(\n        Z,\n        orientation=self.orientation,\n        labels=self.labels,\n        no_plot=True,\n        color_threshold=color_threshold,\n    )\n
Run Code Online (Sandbox Code Playgroud)\n

我们正处于问题的核心。完整回答您的问题的第一步就是简单地包含truncate_mode和 ,p如下P所示:

\n
P = sch.dendrogram(\n    Z,\n    orientation=self.orientation,\n    labels=self.labels,\n    no_plot=True,\n    color_threshold=color_threshold,\n    truncate_mode = \'level\',\n    p = 2\n)\n
Run Code Online (Sandbox Code Playgroud)\n

操作方法如下:

\n

4.猴子补丁

\n

在Python中,术语monkey patch仅指在运行时动态修改类或模块,这意味着monkey patch是一段在运行时扩展或修改其他代码的Python代码。在我们的案例中,您可以做到这一点的本质是:

\n
import plotly.figure_factory._dendrogram as original_dendrogram\noriginal_dendrogram._Dendrogram.get_dendrogram_traces = modified_dendrogram_traces\n
Run Code Online (Sandbox Code Playgroud)\n

modified_dendrogram_traces完整的函数定义modified_dendrogram_traces() 我已经提到的修改在哪里?以及一些将丢失的导入,否则在您调用时会运行import plotly.figure_factory as ff

\n

现在已经足够详细了。下面是整个事情的经过。如果这是您可以使用的东西,我们也许可以使整个事情比硬编码truncate_mode = \'level\'p = 2.

\n

完整代码:

\n
from scipy.cluster.hierarchy import linkage\nimport plotly.figure_factory as ff\nimport plotly.figure_factory._dendrogram as original_dendrogram\nimport numpy as np\n\ndef modified_dendrogram_traces(\n    self, X, colorscale, distfun, linkagefun, hovertext, color_threshold\n):\n    """\n    Calculates all the elements needed for plotting a dendrogram.\n\n    :param (ndarray) X: Matrix of observations as array of arrays\n    :param (list) colorscale: Color scale for dendrogram tree clusters\n    :param (function) distfun: Function to compute the pairwise distance\n                               from the observations\n    :param (function) linkagefun: Function to compute the linkage matrix\n                                  from the pairwise distances\n    :param (list) hovertext: List of hovertext for constituent traces of dendrogram\n    :rtype (tuple): Contains all the traces in the following order:\n        (a) trace_list: List of Plotly trace objects for dendrogram tree\n        (b) icoord: All X points of the dendrogram tree as array of arrays\n            with length 4\n        (c) dcoord: All Y points of the dendrogram tree as array of arrays\n            with length 4\n        (d) ordered_labels: leaf labels in the order they are going to\n            appear on the plot\n        (e) P[\'leaves\']: left-to-right traversal of the leaves\n\n    """\n    import plotly\n    from plotly import exceptions, optional_imports\n    np = optional_imports.get_module("numpy")\n    scp = optional_imports.get_module("scipy")\n    sch = optional_imports.get_module("scipy.cluster.hierarchy")\n    scs = optional_imports.get_module("scipy.spatial")\n    sch = optional_imports.get_module("scipy.cluster.hierarchy")\n    d = distfun(X)\n    Z = linkagefun(d)\n    P = sch.dendrogram(\n        Z,\n        orientation=self.orientation,\n        labels=self.labels,\n        no_plot=True,\n        color_threshold=color_threshold,\n        truncate_mode = \'level\',\n        p = 2\n    )\n\n    icoord = scp.array(P["icoord"])\n    dcoord = scp.array(P["dcoord"])\n    ordered_labels = scp.array(P["ivl"])\n    color_list = scp.array(P["color_list"])\n    colors = self.get_color_dict(colorscale)\n\n    trace_list = []\n\n    for i in range(len(icoord)):\n        # xs and ys are arrays of 4 points that make up the \'\xe2\x88\xa9\' shapes\n        # of the dendrogram tree\n        if self.orientation in ["top", "bottom"]:\n            xs = icoord[i]\n        else:\n            xs = dcoord[i]\n\n        if self.orientation in ["top", "bottom"]:\n            ys = dcoord[i]\n        else:\n            ys = icoord[i]\n        color_key = color_list[i]\n        hovertext_label = None\n        if hovertext:\n            hovertext_label = hovertext[i]\n        trace = dict(\n            type="scatter",\n            x=np.multiply(self.sign[self.xaxis], xs),\n            y=np.multiply(self.sign[self.yaxis], ys),\n            mode="lines",\n            marker=dict(color=colors[color_key]),\n            text=hovertext_label,\n            hoverinfo="text",\n        )\n\n        try:\n            x_index = int(self.xaxis[-1])\n        except ValueError:\n            x_index = ""\n\n        try:\n            y_index = int(self.yaxis[-1])\n        except ValueError:\n            y_index = ""\n\n        trace["xaxis"] = "x" + x_index\n        trace["yaxis"] = "y" + y_index\n\n        trace_list.append(trace)\n\n    return trace_list, icoord, dcoord, ordered_labels, P["leaves"]\n\noriginal_dendrogram._Dendrogram.get_dendrogram_traces = modified_dendrogram_traces\nX = np.random.rand(15, 12) # 15 samples, with 12 dimensions each\nfig = ff.create_dendrogram(X)\nfig.update_layout(width=800, height=500)\nf = fig.full_figure_for_development(warn=False)\nfig.show()\n
Run Code Online (Sandbox Code Playgroud)\n