Muh*_*Zia 4 numpy python-3.x pandas
我有父子关系的数据框,如下所示:
Run Code Online (Sandbox Code Playgroud)**child Parent relationship** A1x2 bc11 direct_parent bc11 Aw00 direct_parent bc11 Aw00 ultimate_parent Aee1 Aee0 direct_parent Aee1 Aee0 ultimate_parent
我想在新数据框中获取所有子节点的所有祖先。结果看起来像这样:
Run Code Online (Sandbox Code Playgroud)node ancesstory_tree A1x2 [A1x2,bc11,Aw00] Aee1 [Aee1,Aee0]
注意:真实数据集在子节点和最终父节点之间可能有很多直接前驱节点。
另一种方法,使用from_pandas_edgelist并ancestors从networkx包:
import networkx as nx
# Create the Directed Graph
G = nx.from_pandas_edgelist(df,
source='Parent',
target='child',
create_using=nx.DiGraph())
# Create dict of nodes and ancestors
ancestors = {n: {n} | nx.ancestors(G, n) for n in df['child'].unique()}
# Convert dict back to DataFrame if necessary
df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
columns=['node', 'ancestry_tree'])
print(df_ancestors)
Run Code Online (Sandbox Code Playgroud)
[出去]
node ancestry_tree
0 A1x2 [A1x2, Aw00, bc11]
1 bc11 [bc11, Aw00]
2 Aee1 [Aee1, Aee0]
Run Code Online (Sandbox Code Playgroud)
要从输出表中过滤掉“中间孩子”,您可以仅使用该out_degree方法过滤到最后一个孩子- 其中最后一个孩子应该有一个 out_degree == 0
last_children = [n for n, d in G.out_degree() if d == 0]
ancestors = {n: {n} | nx.ancestors(G, n) for n in last_children}
df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
columns=['node', 'ancestry_tree'])
Run Code Online (Sandbox Code Playgroud)
[出去]
node ancestry_tree
0 A1x2 [A1x2, Aw00, bc11]
1 Aee1 [Aee1, Aee0]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
581 次 |
| 最近记录: |