如何使用递归来记录父子层次结构中的所有路由?

Den*_*zyl 3 python recursion hierarchy dataframe pandas

我正在尝试遍历层次结构数据帧并将所有可能的路线记录到另一个数据帧中。这些路线可以具有可变的深度。

原始数据帧(df)。最高的列意味着父列中的值不是任何列的子项:

家长 孩子 最高
A 1
C 0
d 0
d e 0

最终目标数据框:

3级 2级 1级 0级
A C
A d e

这就是我目前拥有的

def search(parent):
    for i in range(df.shape[0]):
        if(df.iloc[i,0] == parent):
            search(df.iloc[i,1])

for i in range(df.shape[0]):
    if(df.iloc[i,2] == 1):
        search(df.iloc[i,0])
Run Code Online (Sandbox Code Playgroud)

我能够浏览层次结构,但我不知道如何将其保存为我想要的格式。

Cor*_*ien 5

您可以使用networkx来解决问题。请注意,如果您使用networkx,则不需要这些highest列。查找所有路径的主要函数是all_simple_paths

# Python env: pip install networkx
# Anaconda env: conda install networkx
import networkx as nx

# Create network from your dataframe
#G = nx.from_pandas_edgelist(df, source='parent', target='child',
#                            create_using=nx.DiGraph)

# For older versions of networkx
G = nx.DiGraph()
for _, (source, target) in df[['parent', 'child']].iterrows():
    G.add_edge(source, target)

# Find roots of your graph (a root is a node with no input)
roots = [node for node, degree in G.in_degree() if degree == 0]

# Find leaves of your graph (a leaf is a node with no output)
leaves = [node for node, degree in G.out_degree() if degree == 0]

# Find all paths
paths = []
for root in roots:
  for leaf in leaves:
    for path in nx.all_simple_paths(G, root, leaf):
        paths.append(path)

# Create a new dataframe
out = pd.DataFrame(paths).fillna('')
out.columns = reversed(out.add_prefix('level ').columns)
Run Code Online (Sandbox Code Playgroud)

输出:

>>> out
  level 3 level 2 level 1 level 0
0       a       b       c        
1       a       b       d       e
Run Code Online (Sandbox Code Playgroud)