use*_*260 3 python pandas pandas-groupby
我有一个具有以下结构的数据框
Debtor ID | AccountRating | AccountRatingDate | AmountOutstanding |AmountPastDue
John Snow Closed 2017-03-01 0 0
John Snow Delayed 2017-04-22 2000 500
John Snow Closed 2017-05-23 0 0
John Snow Delayed 2017-07-15 6000 300
Sarah Parker Closed 2017-02-01 0 0
Edward Hall Closed 2017-05-01 0 0
Douglas Core Delayed 2017-01-01 1000 200
Douglas Core Delayed 2017-06-01 1000 400
Run Code Online (Sandbox Code Playgroud)
我想要实现的是
Debtor ID | Incidents of delay | TheMostRecentOutstanding | TheMostRecentPastDue
John Snow 2 6000 300
Sarah Parker 0 0 0
Edward Hall 0 0 0
Douglas Core 2 1000 400
Run Code Online (Sandbox Code Playgroud)
计算延迟事件非常简单
df_account["pastDuebool"] = df_account['amtPastDue'] > 0
new_df = pd.DataFrame(index = df_account.groupby("Debtor ID").groups.keys())
new_df['Incidents of delay'] = df_account.groupby("Debtor ID")["pastDuebool"].sum()
Run Code Online (Sandbox Code Playgroud)
我正在努力提取最新的未完成和过期的 amonts。我的代码是这样的
new_df["TheMostRecentOutstanding"] = df_account.loc[df_account[df_account["AccountRating"]=='Delayed'].groupby('Debtor ID')["AccountRatingDate"].idxmax(),"AmountOutstanding"]
new_df["TheMostRecentPastDue"] = df_account.loc[df_account[df_account["AccountRating"]=='Delayed'].groupby('Debtor ID')["AccountRatingDate"].idxmax(),"AmountPastDue"]
Run Code Online (Sandbox Code Playgroud)
但是它们返回带有所有 NaN 值的系列。请帮助我,我在这里做错了什么?
你可以试试这个:
df.sort_values('AccountRatingDate')\
.query('AccountRating == "Delayed"')\
.groupby('Debtor ID')[['AccountRating','AmountOutstanding','AmountPastDue']]\
.agg({'AccountRating':'count',
'AmountOutstanding':'last',
'AmountPastDue':'last'})\
.reindex(df['Debtor ID'].unique(), fill_value=0)\
.reset_index()
Run Code Online (Sandbox Code Playgroud)
输出:
Debtor ID AccountRating AmountOutstanding AmountPastDue
0 John Snow 2 6000 300
1 Sarah Parker 0 0 0
2 Edward Hall 0 0 0
3 Douglas Core 2 1000 400
Run Code Online (Sandbox Code Playgroud)
细节:
而且,您可以使用rename和字典进行列重命名:
df.sort_values('AccountRatingDate')\
.query('AccountRating == "Delayed"')\
.groupby('Debtor ID')[['AccountRating','AmountOutstanding','AmountPastDue']]\
.agg({'AccountRating':'count',
'AmountOutstanding':'last',
'AmountPastDue':'last'})\
.reindex(df['Debtor ID'].unique(), fill_value=0)\
.rename(columns={'AccoutRating':'Incidents of delay',
'AmountOutstanding':'TheMostRecentOutstanding',
'AmountPastDue':'TheMostRecentPastDue'})\
.reset_index()
Run Code Online (Sandbox Code Playgroud)
输出:
Debtor ID AccountRating TheMostRecentOutstanding TheMostRecentPastDue
0 John Snow 2 6000 300
1 Sarah Parker 0 0 0
2 Edward Hall 0 0 0
3 Douglas Core 2 1000 400
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
575 次 |
| 最近记录: |