递归计算熊猫数据帧中父母和孩子之间的比率

Question

递归计算熊猫数据帧中父母和孩子之间的比率

我竭尽所能地寻找解决方案.我能找到的最接近的是这个,但它并不是我想要的.

我试图模拟一个值与其父值之间的关系.特别是试图计算比率.我还想跟踪血统的水平,比如这个项目有多少孩子？

例如,我想输入一个如下所示的pandas df:

id  parent_id   score
1   0           50
2   1           40
3   1           30
4   2           20
5   4           10

Run Code Online (Sandbox Code Playgroud)

得到这个:

id  parent_id   score   parent_child_ratio  level
1   0           50      NA                  1
2   1           40      1.25                2
3   1           30      1.67                2
4   2           20      2                   3
5   4           10      2                   4

Run Code Online (Sandbox Code Playgroud)

因此,对于每一行,我们都会找到其父级的分数,然后计算(parent_score/child_score)并使其成为新列的值.然后某种计数解决方案增加了孩子的水平.

这一直困扰我一段时间,任何帮助表示赞赏!!!

Answer 1

Ami*_*ory 3

第一部分只是合并：

with_parent = pd.merge(df, df, left_on='parent_id', right_on='id', how='left')
with_parent['child_parent_ratio'] = with_parent.score_y /     with_parent.score_x 
with_parent = with_parent.rename(columns={'id_x': 'id', 'parent_id_x': 'parent_id', 'score_x': 'score'})[['id', 'parent_id', 'score', 'child_parent_ratio']]
>>> with_parent
id  parent_id   score   child_parent_ratio
0   1   0   50  NaN
1   2   1   40  1.250000
2   3   1   30  1.666667
3   4   2   20  2.000000
4   5   4   10  2.000000

Run Code Online (Sandbox Code Playgroud)

对于第二部分，您可以运行广度优先搜索。这将创建一个森林，级别是距根部的距离，如下所示：

例如，使用networkx：

import networkx as nx

G = nx.DiGraph()
G.add_nodes_from(set(with_parent['id'].unique()).union(set(with_parent.parent_id.unique())))
G.add_edges_from([(int(r[1]['parent_id']), int(r[1]['id'])) for r in with_parent.iterrows()])
with_parent['level'] = with_parent['id'].map(nx.shortest_path_length(G, 0))
>>> with_parent
    id  parent_id   score   child_parent_ratio  level
0   1   0   50  NaN         1
1   2   1   40  1.250000    2
2   3   1   30  1.666667    2
3   4   2   20  2.000000    3
4   5   4   10  2.000000    4

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，10 月前
查看次数：	175 次
最近记录：	7 年，10 月前