在熊猫中创建层次结构列

Shu*_*rma 5 python networkx python-3.x pandas

我有一个这样的数据框:

    part part_parent
0  part1         NaN
1  part2       part1
2  part3       part2
3  part4       part3
4  part5       part2
Run Code Online (Sandbox Code Playgroud)

我需要像这样添加一个额外的列层次结构:

    part part_parent                hierarchy
0  part1         NaN                    part1
1  part2       part1             part1/part2/
2  part3       part2       part1/part2/part3/
3  part4       part3  part1/part2/part3/part4
4  part5       part2        part1/part2/part5
Run Code Online (Sandbox Code Playgroud)

字典创建输入/输出数据帧:

from numpy import nan

df1 = pd.DataFrame({'part': {0: 'part1', 1: 'part2', 2: 'part3', 3: 'part4', 4: 'part5'},
 'part_parent': {0: nan, 1: 'part1', 2: 'part2', 3: 'part3', 4: 'part2'}})


df2 = pd.DataFrame({'part': {0: 'part1', 1: 'part2', 2: 'part3', 3: 'part4', 4: 'part5'},
 'part_parent': {0: nan, 1: 'part1', 2: 'part2', 3: 'part3', 4: 'part2'},
 'hierarchy': {0: 'part1',
  1: 'part1/part2/',
  2: 'part1/part2/part3/',
  3: 'part1/part2/part3/part4',
  4: 'part1/part2/part5'}})
Run Code Online (Sandbox Code Playgroud)

注意:我已经看到了几个与NetworkX解决此问题相关的线程,但我无法这样做。

任何帮助表示赞赏。

use*_*203 4

这是一个使用的解决方案networkx。它将nan作为根节点,并在此基础上找到到每个节点的最短路径。

import networkx as nx

def find_path(net, source, target):
    # Adjust this as needed (in case multiple paths are present)
    # or error handling in case a path doesn't exist
    path = nx.shortest_path(net, source, target)
    return "/".join(list(path)[1:])

net = nx.from_pandas_edgelist(df1, "part", "part_parent")
df1["hierarchy"] = [find_path(net, nan, node) for node in df1["part"]]
Run Code Online (Sandbox Code Playgroud)
    part part_parent                hierarchy
0  part1         NaN                    part1
1  part2       part1              part1/part2
2  part3       part2        part1/part2/part3
3  part4       part3  part1/part2/part3/part4
4  part5       part2        part1/part2/part5
Run Code Online (Sandbox Code Playgroud)

路径的格式化是针对此示例设计的,如果需要更强大的错误处理或多路径格式化,则必须调整路径查找器。