获取pandas数据框中子节点的所有直接中间和最终父节点

Muh*_*Zia 4 numpy python-3.x pandas

我有父子关系的数据框,如下所示:

**child                Parent              relationship**

   A1x2                 bc11                direct_parent
   bc11                 Aw00                direct_parent
   bc11                 Aw00                ultimate_parent
   Aee1                 Aee0                direct_parent
   Aee1                 Aee0                ultimate_parent
Run Code Online (Sandbox Code Playgroud)

我想在新数据框中获取所有子节点的所有祖先。结果看起来像这样:

node                   ancesstory_tree

A1x2                    [A1x2,bc11,Aw00]   
Aee1                    [Aee1,Aee0]
Run Code Online (Sandbox Code Playgroud)

注意:真实数据集在子节点和最终父节点之间可能有很多直接前驱节点。

Chr*_*s A 5

另一种方法,使用from_pandas_edgelistancestorsnetworkx包:

import networkx as nx

# Create the Directed Graph
G = nx.from_pandas_edgelist(df,
                            source='Parent',
                            target='child',
                            create_using=nx.DiGraph())

# Create dict of nodes and ancestors
ancestors = {n: {n} | nx.ancestors(G, n) for n in df['child'].unique()}

# Convert dict back to DataFrame if necessary
df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
                            columns=['node', 'ancestry_tree'])

print(df_ancestors)
Run Code Online (Sandbox Code Playgroud)

[出去]

   node       ancestry_tree
0  A1x2  [A1x2, Aw00, bc11]
1  bc11        [bc11, Aw00]
2  Aee1        [Aee1, Aee0]
Run Code Online (Sandbox Code Playgroud)

要从输出表中过滤掉“中间孩子”,您可以仅使用该out_degree方法过滤到最后一个孩子- 其中最后一个孩子应该有一个 out_degree == 0

last_children = [n for n, d in G.out_degree() if d == 0]

ancestors = {n: {n} | nx.ancestors(G, n) for n in last_children}

df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
                            columns=['node', 'ancestry_tree'])
Run Code Online (Sandbox Code Playgroud)

[出去]

   node       ancestry_tree
0  A1x2  [A1x2, Aw00, bc11]
1  Aee1        [Aee1, Aee0]
Run Code Online (Sandbox Code Playgroud)