如何通过改变日期对pandas DataFrame进行分组?

use*_*979 5 python numpy pandas

我正在尝试将每日数据汇总到财务季度数据中.例如,我有一个财政季度结束日期表:

Company Period Quarter_End
M       2016Q1 05/02/2015
M       2016Q2 08/01/2015
M       2016Q3 10/31/2015
M       2016Q4 01/30/2016
WFM     2015Q2 04/12/2015
WFM     2015Q3 07/05/2015 
WFM     2015Q4 09/27/2015
WFM     2016Q1 01/17/2016
Run Code Online (Sandbox Code Playgroud)

和每日数据表:

Company Date       Price
M       06/20/2015 1.05
M       06/22/2015 4.05
M       07/10/2015 3.45
M       07/29/2015 1.86
M       08/24/2015 1.58
M       09/02/2015 8.64
M       09/22/2015 2.56
M       10/20/2015 5.42
M       11/02/2015 1.58
M       11/24/2015 4.58
M       12/03/2015 6.48
M       12/05/2015 4.56
M       01/03/2016 7.14
M       01/30/2016 6.34
WFM     06/20/2015 1.05
WFM     06/22/2015 4.05
WFM     07/10/2015 3.45
WFM     07/29/2015 1.86
WFM     08/24/2015 1.58
WFM     09/02/2015 8.64
WFM     09/22/2015 2.56
WFM     10/20/2015 5.42
WFM     11/02/2015 1.58
WFM     11/24/2015 4.58
WFM     12/03/2015 6.48
WFM     12/05/2015 4.56
WFM     01/03/2016 7.14
WFM     01/17/2016 6.34
Run Code Online (Sandbox Code Playgroud)

我想创建下表.

Company Period  Quarter_end Sum(Price)
M       2016Q2  8/1/2015    10.41
M       2016Q3  10/31/2015  18.2
M       2016Q4  1/30/2016   30.68
WFM     2015Q3  7/5/2015    5.1
WFM     2015Q4  9/27/2015   18.09
WFM     2016Q1  1/17/2016   36.1
Run Code Online (Sandbox Code Playgroud)

但是,我不知道如何通过改变日期进行分组而不循环遍历每条记录.任何帮助是极大的赞赏.

谢谢!

jez*_*ael 6

我想你可以用merge_ordered:

#first convert columns to datetime
df1.Quarter_End = pd.to_datetime(df1.Quarter_End)
df2.Date = pd.to_datetime(df2.Date)


df = pd.merge_ordered(df1, 
                      df2, 
                      left_on=['Company','Quarter_End'], 
                      right_on=['Company','Date'], 
                      how='outer')
print (df)
   Company  Period Quarter_End       Date  Price
0        M  2016Q1  2015-05-02        NaT    NaN
1        M     NaN         NaT 2015-06-20   1.05
2        M     NaN         NaT 2015-06-22   4.05
3        M     NaN         NaT 2015-07-10   3.45
4        M     NaN         NaT 2015-07-29   1.86
5        M  2016Q2  2015-08-01        NaT    NaN
6        M     NaN         NaT 2015-08-24   1.58
7        M     NaN         NaT 2015-09-02   8.64
8        M     NaN         NaT 2015-09-22   2.56
9        M     NaN         NaT 2015-10-20   5.42
10       M  2016Q3  2015-10-31        NaT    NaN
11       M     NaN         NaT 2015-11-02   1.58
12       M     NaN         NaT 2015-11-24   4.58
13       M     NaN         NaT 2015-12-03   6.48
14       M     NaN         NaT 2015-12-05   4.56
15       M     NaN         NaT 2016-01-03   7.14
16       M  2016Q4  2016-01-30 2016-01-30   6.34
17     WFM  2015Q2  2015-04-12        NaT    NaN
18     WFM     NaN         NaT 2015-06-20   1.05
19     WFM     NaN         NaT 2015-06-22   4.05
20     WFM  2015Q3  2015-07-05        NaT    NaN
21     WFM     NaN         NaT 2015-07-10   3.45
22     WFM     NaN         NaT 2015-07-29   1.86
23     WFM     NaN         NaT 2015-08-24   1.58
24     WFM     NaN         NaT 2015-09-02   8.64
25     WFM     NaN         NaT 2015-09-22   2.56
26     WFM  2015Q4  2015-09-27        NaT    NaN
27     WFM     NaN         NaT 2015-10-20   5.42
28     WFM     NaN         NaT 2015-11-02   1.58
29     WFM     NaN         NaT 2015-11-24   4.58
30     WFM     NaN         NaT 2015-12-03   6.48
31     WFM     NaN         NaT 2015-12-05   4.56
32     WFM     NaN         NaT 2016-01-03   7.14
33     WFM  2016Q1  2016-01-17 2016-01-17   6.34
Run Code Online (Sandbox Code Playgroud)

然后回填NaNPeriod,并Quarter_End通过bfill和聚集sum.如果需要删除所有NaN值,请添加Series.dropna并持续reset_index:

df.Period = df.Period.bfill()
df.Quarter_End = df.Quarter_End.bfill()

print (df.groupby(['Company','Period','Quarter_End'])['Price'].sum().dropna().reset_index())

  Company  Period Quarter_End  Price
0       M  2016Q2  2015-08-01  10.41
1       M  2016Q3  2015-10-31  18.20
2       M  2016Q4  2016-01-30  30.68
3     WFM  2015Q3  2015-07-05   5.10
4     WFM  2015Q4  2015-09-27  18.09
5     WFM  2016Q1  2016-01-17  36.10
Run Code Online (Sandbox Code Playgroud)