我有一个像这样的熊猫数据帧:
>>> df = pd.DataFrame({'MONTREGL':[10,10,2222,35,200,56,5555],'SINID':['aaa','aaa','aaa','bbb','bbb','ccc','ccc'],'EXTRA':[400,400,400,500,500,333,333]})
>>> df
MONTREGL SINID EXTRA
0 10 aaa 400
1 10 aaa 400
2 2222 aaa 400
3 35 bbb 500
4 200 bbb 500
5 56 ccc 333
6 5555 ccc 333
Run Code Online (Sandbox Code Playgroud)
我想MONTREGL对每个 groupby的列求和SINID...
所以我得到 2242 aaa 等等......我还想保留 column 的值EXTRA。
这是预期的结果:
MONTREGL SINID EXTRA
0 2242 aaa 400
1 235 bbb 500
2 5611 ccc 333
Run Code Online (Sandbox Code Playgroud)
提前感谢您的帮助!
您能告诉我一种优化此代码的方法吗?由于数据集庞大,需要数十分钟才能完成...
df['sinistre'] = 0
for index_sin, row_sin in sinistre1.iterrows():
date_surv = row_sin['DATESURV']
quit_sin = df.loc[df['id_police'] == row_sin['id_police']]
for index, row in quit_sin.iterrows():
if row['DATEEFFE'] < date_surv < row['DATE_FIN']:
df['sinistre'][index] = 1
Run Code Online (Sandbox Code Playgroud)
这是DataFrames sinistre1和的示例数据集df:
>>> sinistre1
id_police id_sinistre DATESURV
0 p123 s123 30/05/2017
1 p123 s124 30/11/2017
2 p123 s125 29/02/2018
3 b123 s126 28/02/2018
4 b123 s127 30/05/2018
>>> df
id_police DATEEFFE DATE_FIN prime prime2
0 p123 24/01/2017 24/02/2017 0 0
1 p123 24/11/2017 24/12/2017 0 30 …Run Code Online (Sandbox Code Playgroud)