Alb*_*bin 4 python hierarchical-clustering dendrogram plotly
我想使用plotly 绘制层次聚类的树状图,并显示该图的一小部分子集,因为对于大量样本,该图在底部可能非常密集。
我使用绘图包装函数 create_dendrogram 和以下代码绘制了该图:
from scipy.cluster.hierarchy import linkage
import plotly.figure_factory as ff
fig = ff.create_dendrogram(test_df, linkagefun=lambda x: linkage(test_df, 'average', metric='euclidean'))
fig.update_layout(autosize=True, hovermode='closest')
fig.update_xaxes(mirror=False, showgrid=True, showline=False, showticklabels=False)
fig.update_yaxes(mirror=False, showgrid=True, showline=True)
fig.show()
Run Code Online (Sandbox Code Playgroud)
下面是使用 matplotlib 绘制的图,scipy 库默认使用该图,为了便于解释,该图被截断为 4 个级别:
from scipy.cluster.hierarchy import dendrogram,linkage
x = linkage(test_df,method='average')
dendrogram(x,truncate_mode='level',p=4)
plt.show()
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,截断对于解释大量样本非常有用,我如何在绘图中实现这一点?
似乎没有一种直接的方法可以做到这一点ff.create_dendrogram()。但这并不意味着这是不可能的。但我至少会考虑Dash Clustergram提供的出色功能。如果你坚持坚持使用ff.create_dendrogram(),这将会变得比 Plotly 用户已经习惯的更加混乱。您还没有提供数据样本,所以让我们使用 PlotlyBasic Dendrogram示例来代替:
import plotly.figure_factory as ff\nimport numpy as np\nnp.random.seed(1)\n\nX = np.random.rand(15, 12) # 15 samples, with 12 dimensions each\nfig = ff.create_dendrogram(X)\nfig.update_layout(width=800, height=500)\nf = fig.full_figure_for_development(warn=False)\nfig.show()\nRun Code Online (Sandbox Code Playgroud)\n好消息是,在我们采取了一些步骤之后,完全相同的片段将产生以下截断的图,我将在下面的详细信息中解释这些步骤。
\n如果有人在我的回答中知道了更好的方法来执行以下操作,那么请分享。
\nff.create_dendrogram()是一个包装器scipy.cluster.hierarchy.dendrogram您可以致电help(ff.create_dendrogram)了解:
\n\n[...]这是 scipy.cluster.hierarchy.dendrogram 的薄包装。
\n
从可用的参数中,您还可以看到似乎没有一个参数可以处理与截断相关的任何内容:
\n\n\ncreate_dendrogram(X,方向=\'底部\',标签=无,\ncolorscale=无,distfun=无,linkagefun=<函数位于\n0x0000016F09D4CEE0>,hovertext=无,color_threshold=无)
\n
scipy.cluster.hierarchy.dendrogramff.create_dendrogram(X)在这里,我们可以看到,当我们将其与源代码进行比较时,在实现该函数后,一些核心元素被遗漏了:
scipy.cluster.hierarchy.dendrogram(Z, p=30, truncate_mode=None, color_threshold=None, get_leaves=True, orientation=\'top\', labels=None, count_sort=False, distance_sort=False, show_leaf_counts=True, no_plot=False, no_labels=False, leaf_font_size=None, leaf_rotation=None, leaf_label_func=None, show_contracted=False, link_color_func=None, ax=None, above_threshold_color=\'C0\')\nRun Code Online (Sandbox Code Playgroud)\ntruncate_mode应该正是我们正在寻找的。所以,现在我们知道这scipy可能已经具备了构建截断树状图基础所需的一切,但下一步是什么?
scipy.cluster.hierarchy.dendrogram藏身之处ff.create_dendrogram(X)ff.create_dendrogram.__code__将显示源代码在您的系统中的位置。就我而言,这是:
"C:\\Users\\vestland\\Miniconda3\\envs\\dashy\\lib\\site-packages\\plotly\\figure_factory\\_dendrogram.py"\nRun Code Online (Sandbox Code Playgroud)\n因此,如果您愿意,可以仔细查看相应文件夹中的完整源代码。如果您这样做,您将看到一个特别有趣的部分,其中处理了我们上面列出的一些属性:
\ndef get_dendrogram_traces(\n self, X, colorscale, distfun, linkagefun, hovertext, color_threshold\n):\n """\n Calculates all the elements needed for plotting a dendrogram.\n.\n.\n.\nP = sch.dendrogram(\n Z,\n orientation=self.orientation,\n labels=self.labels,\n no_plot=True,\n color_threshold=color_threshold,\n )\nRun Code Online (Sandbox Code Playgroud)\n我们正处于问题的核心。完整回答您的问题的第一步就是简单地包含truncate_mode和 ,p如下P所示:
P = sch.dendrogram(\n Z,\n orientation=self.orientation,\n labels=self.labels,\n no_plot=True,\n color_threshold=color_threshold,\n truncate_mode = \'level\',\n p = 2\n)\nRun Code Online (Sandbox Code Playgroud)\n操作方法如下:
\n在Python中,术语monkey patch仅指在运行时动态修改类或模块,这意味着monkey patch是一段在运行时扩展或修改其他代码的Python代码。在我们的案例中,您可以做到这一点的本质是:
\nimport plotly.figure_factory._dendrogram as original_dendrogram\noriginal_dendrogram._Dendrogram.get_dendrogram_traces = modified_dendrogram_traces\nRun Code Online (Sandbox Code Playgroud)\nmodified_dendrogram_traces完整的函数定义modified_dendrogram_traces() 和我已经提到的修改在哪里?以及一些将丢失的导入,否则在您调用时会运行import plotly.figure_factory as ff
现在已经足够详细了。下面是整个事情的经过。如果这是您可以使用的东西,我们也许可以使整个事情比硬编码truncate_mode = \'level\'和p = 2.
from scipy.cluster.hierarchy import linkage\nimport plotly.figure_factory as ff\nimport plotly.figure_factory._dendrogram as original_dendrogram\nimport numpy as np\n\ndef modified_dendrogram_traces(\n self, X, colorscale, distfun, linkagefun, hovertext, color_threshold\n):\n """\n Calculates all the elements needed for plotting a dendrogram.\n\n :param (ndarray) X: Matrix of observations as array of arrays\n :param (list) colorscale: Color scale for dendrogram tree clusters\n :param (function) distfun: Function to compute the pairwise distance\n from the observations\n :param (function) linkagefun: Function to compute the linkage matrix\n from the pairwise distances\n :param (list) hovertext: List of hovertext for constituent traces of dendrogram\n :rtype (tuple): Contains all the traces in the following order:\n (a) trace_list: List of Plotly trace objects for dendrogram tree\n (b) icoord: All X points of the dendrogram tree as array of arrays\n with length 4\n (c) dcoord: All Y points of the dendrogram tree as array of arrays\n with length 4\n (d) ordered_labels: leaf labels in the order they are going to\n appear on the plot\n (e) P[\'leaves\']: left-to-right traversal of the leaves\n\n """\n import plotly\n from plotly import exceptions, optional_imports\n np = optional_imports.get_module("numpy")\n scp = optional_imports.get_module("scipy")\n sch = optional_imports.get_module("scipy.cluster.hierarchy")\n scs = optional_imports.get_module("scipy.spatial")\n sch = optional_imports.get_module("scipy.cluster.hierarchy")\n d = distfun(X)\n Z = linkagefun(d)\n P = sch.dendrogram(\n Z,\n orientation=self.orientation,\n labels=self.labels,\n no_plot=True,\n color_threshold=color_threshold,\n truncate_mode = \'level\',\n p = 2\n )\n\n icoord = scp.array(P["icoord"])\n dcoord = scp.array(P["dcoord"])\n ordered_labels = scp.array(P["ivl"])\n color_list = scp.array(P["color_list"])\n colors = self.get_color_dict(colorscale)\n\n trace_list = []\n\n for i in range(len(icoord)):\n # xs and ys are arrays of 4 points that make up the \'\xe2\x88\xa9\' shapes\n # of the dendrogram tree\n if self.orientation in ["top", "bottom"]:\n xs = icoord[i]\n else:\n xs = dcoord[i]\n\n if self.orientation in ["top", "bottom"]:\n ys = dcoord[i]\n else:\n ys = icoord[i]\n color_key = color_list[i]\n hovertext_label = None\n if hovertext:\n hovertext_label = hovertext[i]\n trace = dict(\n type="scatter",\n x=np.multiply(self.sign[self.xaxis], xs),\n y=np.multiply(self.sign[self.yaxis], ys),\n mode="lines",\n marker=dict(color=colors[color_key]),\n text=hovertext_label,\n hoverinfo="text",\n )\n\n try:\n x_index = int(self.xaxis[-1])\n except ValueError:\n x_index = ""\n\n try:\n y_index = int(self.yaxis[-1])\n except ValueError:\n y_index = ""\n\n trace["xaxis"] = "x" + x_index\n trace["yaxis"] = "y" + y_index\n\n trace_list.append(trace)\n\n return trace_list, icoord, dcoord, ordered_labels, P["leaves"]\n\noriginal_dendrogram._Dendrogram.get_dendrogram_traces = modified_dendrogram_traces\nX = np.random.rand(15, 12) # 15 samples, with 12 dimensions each\nfig = ff.create_dendrogram(X)\nfig.update_layout(width=800, height=500)\nf = fig.full_figure_for_development(warn=False)\nfig.show()\nRun Code Online (Sandbox Code Playgroud)\n