R o*_*low 2 python python-3.x pandas rolling-computation
我有一个像这样的数据框:
Product_ID Quantity Year Quarter
1 100 2021 1
1 100 2021 2
1 50 2021 3
1 100 2021 4
1 100 2022 1
2 100 2021 1
2 100 2021 2
3 100 2021 1
3 100 2021 2
Run Code Online (Sandbox Code Playgroud)
我想获取每个 Product_ID 的过去三个月(不包括当月)的总和。
因此我尝试了这个:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID'['Quantity'].shift(1,fill_value=0)
.rolling(3).sum().reset_index(0,drop=True)
)
# Shifting 1, because I want to exclude the current row.
# Rolling 3, because I want to have the 3 'rows' before
# Grouping by, because I want to have the calculation PER product
Run Code Online (Sandbox Code Playgroud)
我的代码失败了,因为它不仅计算每个产品的数量,而且还会为我提供其他产品的数字(假设产品 2,第 1 季度:为我提供产品 1 的 3 行)。
我提出的结果:
Product_ID Quantity Year Quarter Qty_Sum_3qrts
1 100 2021 1 0 # because we dont historical data for this id
1 100 2021 2 100 # sum of last month of this product
1 50 2021 3 200 # sum of last 2 months of this product
1 100 2021 4 250 # sum of last 3 months of this product
1 100 2022 1 250 # sum of last 3 months of this product
2 100 2021 1 0 # because we dont have hist data for this id
2 100 2021 2 100 # sum of last month of this product
3 100 2021 1 0 # etc
3 100 2021 2 100 # etc
Run Code Online (Sandbox Code Playgroud)
您需要应用每组的滚动总和,您可以使用apply
以下方法:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID')['Quantity']
.apply(lambda s: s.shift(1,fill_value=0)
.rolling(3, min_periods=1).sum())
)
Run Code Online (Sandbox Code Playgroud)
输出:
Product_ID Quantity Year Quarter Qty_Sum_3qrts
0 1 100 2021 1 0.0
1 1 100 2021 2 100.0
2 1 50 2021 3 200.0
3 1 100 2021 4 250.0
4 1 100 2022 1 250.0
5 2 100 2021 1 0.0
6 2 100 2021 2 100.0
7 3 100 2021 1 0.0
8 3 100 2021 2 100.0
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
105 次 |
最近记录: |