给定以下数据框:
+------------+--------+
| Date | Amount |
+------------+--------+
| 01/05/2019 | 15 |
| 27/05/2019 | 20 |
| 27/05/2019 | 15 |
| 25/06/2019 | 10 |
| 29/06/2019 | 25 |
| 01/07/2019 | 50 |
+------------+--------+
Run Code Online (Sandbox Code Playgroud)
我需要获取所有先前日期的滚动总和,如下所示:
+------------+--------+
| Date | Amount |
+------------+--------+
| 01/05/2019 | NaN |
| 27/05/2019 | 15 |
| 27/05/2019 | 15 |
| 15/06/2019 | 35 |
| 29/06/2019 | 10 |
| 01/07/2019 | 35 |
+------------+--------+
Run Code Online (Sandbox Code Playgroud)
使用:
df = pd.DataFrame(
{
'Date': {
0: datetime.datetime(2019, 5, 1),
1: datetime.datetime(2019, 5, 27),
2: datetime.datetime(2019, 5, 27),
3: datetime.datetime(2019, 6, 15),
4: datetime.datetime(2019, 6, 29),
5: datetime.datetime(2019, 7, 1),
},
'Amount': {0: 15, 1: 20, 2: 15, 3: 10, 4: 25, 5: 50}
}
)
df.sort_values("Date", inplace=True)
df_roll = df.rolling("28d", on="Date", closed="left").sum()
Run Code Online (Sandbox Code Playgroud)
让我明白:
+------------+--------+
| Date | Amount |
+------------+--------+
| 01/05/2019 | NaN |
| 27/05/2019 | 15 |
| 27/05/2019 | 35 | <-- Should be 15
| 15/06/2019 | 35 |
| 29/06/2019 | 10 |
| 01/07/2019 | 35 |
+------------+--------+
Run Code Online (Sandbox Code Playgroud)
这不太正确。
我如何获得所有先前日期而不是所有先前行的总和?
你可以做
df['new'] = df.Date.map(df.groupby('Date').Amount.sum().rolling("28d", closed="left").sum())
df
Date Amount new
0 2019-05-01 15 NaN
1 2019-05-27 20 15.0
2 2019-05-27 15 15.0
3 2019-06-15 10 35.0
4 2019-06-29 25 10.0
5 2019-07-01 50 35.0
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
622 次 |
最近记录: |