获取每组的滚动总和

R o*_*low 2 python python-3.x pandas rolling-computation

我有一个像这样的数据框:

Product_ID    Quantity    Year    Quarter   
  1             100       2021      1          
  1             100       2021      2         
  1              50       2021      3          
  1             100       2021      4          
  1             100       2022      1         
  2             100       2021      1          
  2             100       2021      2          
  3             100       2021      1          
  3             100       2021      2         
Run Code Online (Sandbox Code Playgroud)

我想获取每个 Product_ID 的过去三个月(不包括当月)的总和。

因此我尝试了这个:

df['Qty_Sum_3qrts'] = (df.groupby('Product_ID'['Quantity'].shift(1,fill_value=0)
                         .rolling(3).sum().reset_index(0,drop=True)
                       )

# Shifting 1, because I want to exclude the current row. 
# Rolling 3, because I want to have the 3 'rows' before 
# Grouping by, because I want to have the calculation PER product 
Run Code Online (Sandbox Code Playgroud)

我的代码失败了,因为它不仅计算每个产品的数量,而且还会为我提供其他产品的数字(假设产品 2,第 1 季度:为我提供产品 1 的 3 行)。

我提出的结果:

Product_ID    Quantity    Year    Quarter   Qty_Sum_3qrts
  1             100       2021      1          0 # because we dont historical data for this id
  1             100       2021      2          100 # sum of last month of this product 
  1              50       2021      3          200 # sum of last 2 months of this product
  1             100       2021      4          250 # sum of last 3 months of this product
  1             100       2022      1          250 # sum of last 3 months of this product
  2             100       2021      1          0  # because we dont have hist data for this id
  2             100       2021      2          100 # sum of last month of this product
  3             100       2021      1          0   # etc
  3             100       2021      2          100  # etc 
Run Code Online (Sandbox Code Playgroud)

moz*_*way 5

您需要应用每组的滚动总和,您可以使用apply以下方法:

df['Qty_Sum_3qrts'] = (df.groupby('Product_ID')['Quantity']
                         .apply(lambda s: s.shift(1,fill_value=0)
                                           .rolling(3, min_periods=1).sum())
                       )
Run Code Online (Sandbox Code Playgroud)

输出:

   Product_ID  Quantity  Year  Quarter  Qty_Sum_3qrts
0           1       100  2021        1            0.0
1           1       100  2021        2          100.0
2           1        50  2021        3          200.0
3           1       100  2021        4          250.0
4           1       100  2022        1          250.0
5           2       100  2021        1            0.0
6           2       100  2021        2          100.0
7           3       100  2021        1            0.0
8           3       100  2021        2          100.0
Run Code Online (Sandbox Code Playgroud)