gee*_*mer 5 python plotly sankey-diagram plotly-python
我有一个数据框A
date Cluster count Users
01/01/2021 ClusterA 10
01/01/2021 ClusterB 10
01/01/2021 ClusterB 9
02/01/2021 ClusterA 14
02/01/2021 ClusterB 10
02/01/2021 ClusterB 5
Run Code Online (Sandbox Code Playgroud)
我想可视化集群之间的用户迁移,为此,我首先生成以下 dataframeB :
date Source Target Value
02/01/2021 ClusterA ClusterA 8
02/01/2021 ClusterA ClusterB 2
02/01/2021 ClusterB ClusterB 8
02/01/2021 ClusterB ClusterA 2
02/01/2021 ClusterC ClusterA 4
02/01/2021 ClusterC ClusterC 5
Run Code Online (Sandbox Code Playgroud)
我画了桑基图:
import plotly.graph_objects as go
label = ["ClusterA01/01/2021","ClusterB01/01/2021","ClusterC01/01/2021","ClusterA02/01/2021","ClusterB02/01/2021","ClusterC02/01/2021"]
source = [0, 0, 1, 1, 2,2]
target = [3, 4, 3, 4, 3,5]
value = [8, 2, 2, 8, 4,5]
# data to dict, dict to sankey
link = dict(source = source, target = target, value = value)
node = dict(label = label, pad=50, thickness=5)
data = go.Sankey(link = link, node=node)
# plot
fig = go.Figure(data)
fig.show()
Run Code Online (Sandbox Code Playgroud)
我面临的问题是我连续日期有相同的记录:
date Source Target Value
02/01/2021 ClusterA ClusterA 8
02/01/2021 ClusterA ClusterB 2
02/01/2021 ClusterB ClusterB 8
02/01/2021 ClusterB ClusterA 2
02/01/2021 ClusterC ClusterA 4
02/01/2021 ClusterC ClusterC 5
03/01/2021 ClusterA ClusterA 7
03/01/2021 ClusterA ClusterB 2
......
12/09/2021 ClusterA ClusterB 5
Run Code Online (Sandbox Code Playgroud)
小智 0
我认为我无法从您提供的内容中重现这一点。但是,我认为您想要的是将源名称和目标名称与 dataframeB 中的日期连接起来。如果您将“02/01/2021 Cluster A”和“03/01/2021 Cluster A”视为与一开始完全不同,那么您最终应该得到我认为您正在寻找的内容。
本质上,在plotly函数中,您只需命名源和目标。您无法选择使用其他数据(例如与数据帧中的流关联的日期)来指定它们属于哪一组节点。