从 timeserie 数据帧绘制桑基图

gee*_*mer 5 python plotly sankey-diagram plotly-python

我有一个数据框A

date        Cluster    count Users 
01/01/2021  ClusterA    10
01/01/2021  ClusterB    10
01/01/2021  ClusterB    9
02/01/2021  ClusterA    14
02/01/2021  ClusterB    10
02/01/2021  ClusterB    5
Run Code Online (Sandbox Code Playgroud)

我想可视化集群之间的用户迁移,为此,我首先生成以下 dataframeB :

date        Source     Target    Value 
02/01/2021  ClusterA   ClusterA   8
02/01/2021  ClusterA   ClusterB   2
02/01/2021  ClusterB   ClusterB   8
02/01/2021  ClusterB   ClusterA   2
02/01/2021  ClusterC   ClusterA   4
02/01/2021  ClusterC   ClusterC   5
Run Code Online (Sandbox Code Playgroud)

我画了桑基图:

import plotly.graph_objects as go
label = ["ClusterA01/01/2021","ClusterB01/01/2021","ClusterC01/01/2021","ClusterA02/01/2021","ClusterB02/01/2021","ClusterC02/01/2021"]
source = [0, 0, 1, 1, 2,2]
target = [3, 4, 3, 4, 3,5]
value = [8, 2, 2, 8, 4,5]
# data to dict, dict to sankey
link = dict(source = source, target = target, value = value)
node = dict(label = label, pad=50, thickness=5)
data = go.Sankey(link = link, node=node)
# plot
fig = go.Figure(data)
fig.show()
Run Code Online (Sandbox Code Playgroud)

桑基图

我面临的问题是我连续日期有相同的记录:

date        Source     Target    Value 
02/01/2021  ClusterA   ClusterA   8
02/01/2021  ClusterA   ClusterB   2
02/01/2021  ClusterB   ClusterB   8
02/01/2021  ClusterB   ClusterA   2
02/01/2021  ClusterC   ClusterA   4
02/01/2021  ClusterC   ClusterC   5
03/01/2021  ClusterA   ClusterA   7
03/01/2021  ClusterA   ClusterB   2
......
12/09/2021  ClusterA   ClusterB   5
Run Code Online (Sandbox Code Playgroud)

我想可视化每天集群之间的用户迁移,想法是有一个接近的桑基图(有日期而不是月份): 在此输入图像描述

小智 0

我认为我无法从您提供的内容中重现这一点。但是,我认为您想要的是将源名称和目标名称与 dataframeB 中的日期连接起来。如果您将“02/01/2021 Cluster A”和“03/01/2021 Cluster A”视为与一开始完全不同,那么您最终应该得到我认为您正在寻找的内容。

本质上,在plotly函数中,您只需命名源和目标。您无法选择使用其他数据(例如与数据帧中的流关联的日期)来指定它们属于哪一组节点。