Python Pandas Groupby添加列除以得到平均值

Joh*_*ohn 1 python pandas pandas-groupby

我有一个大型数据集(15k +行),我试图根据投资者的数量(而非实际所有权)显示投资的比例份额.这是一个众所周知的故障,但我们正试图解决表示问题.我现在可以删除SQL中的重复项(如果我有3个投资600的客户,我会删除重复项,让1个客户600,但它是全部或全部,而是我想向这三个客户显示每个200) .

我需要groupby函数来考虑Customerid,parentid和投资金额.然后我需要创建另一个列,并为每个客户(每个Sponsorname /行)提供每个客户的平均投资(投资/#该客户ID /父ID组合的特定金额的客户).最后,我需要groupby,通过SponsorName总结投资并计算客户ID.

数据集:

CustomerID   ParentID    SponsorName    Investment
1            55          Bob            600
1            55          Jack           600
1            55          Mary           600
5            65          Bill           1200
5            65          Jim            1200
5            65          Jill           1200
1            55          Bob            1000
1            55          Jack           1000
1            55          Mary           1000
Run Code Online (Sandbox Code Playgroud)

输出:

CustomerID   ParentID    SponsorName    Investment   Avg Investment
1            55          Bob            600          200
1            55          Jack           600          200
1            55          Mary           600          200
5            65          Bill           1200         400
5            65          Jim            1200         400
5            65          Jill           1200         400
1            55          Bob            1000         333.33
1            55          Jack           1000         333.33
1            55          Mary           1000         333.33 
Run Code Online (Sandbox Code Playgroud)

谢谢!

jpp*_*jpp 5

您可以使用GroupBy+ transform具有size:

counts = df.groupby(['CustomerID', 'ParentID'])['SponsorName'].transform('size')
df['Avg Investment'] = df['Investment'] / counts
Run Code Online (Sandbox Code Playgroud)

输出:

   CustomerID  ParentID SponsorName  Investment  Avg Investment
0           1        55         Bob         600           200.0
1           1        55        Jack         600           200.0
2           1        55        Mary         600           200.0
3           5        65        Bill        1200           400.0
4           5        65         Jim        1200           400.0
5           5        65        Jill        1200           400.0
Run Code Online (Sandbox Code Playgroud)

更新了修改后的问题,@ScottBoston礼貌

group_keys = ['CustomerID', 'ParentID', 'Investment']
counts = df.groupby(group_keys)['SponsorName'].transform('size')
df['Avg Investment'] = df['Investment'] / counts
Run Code Online (Sandbox Code Playgroud)

输出:

   CustomerID  ParentID SponsorName  Investment  Avg Investment
0           1        55         Bob         600      200.000000
1           1        55        Jack         600      200.000000
2           1        55        Mary         600      200.000000
3           5        65        Bill        1200      400.000000
4           5        65         Jim        1200      400.000000
5           5        65        Jill        1200      400.000000
6           1        55         Bob        1000      333.333333
7           1        55        Jack        1000      333.333333
8           1        55        Mary        1000      333.333333
Run Code Online (Sandbox Code Playgroud)