熊猫:带有日期标准的SQL SelfJoin

Question

熊猫:带有日期标准的SQL SelfJoin

B_M*_*ner 7 python pandas pandas-groupby

我经常在关系数据库中的SQL中进行的一个查询是将表连接回自身,并根据相同id的记录在时间上向后或向前汇总每一行.

例如,假设table1为列'ID','Date','Var1'

在SQL中,我可以为过去3个月的每个记录加上var1,如下所示:

Select a.ID, a.Date, sum(b.Var1) as sum_var1
from table1 a
left outer join table1 b
on a.ID = b.ID
and months_between(a.date,b.date) <0
and months_between(a.date,b.date) > -3

Run Code Online (Sandbox Code Playgroud)

在熊猫队有什么办法吗？

Answer 1

jpp*_*jpp 2

看来你需要GroupBy+ rolling。以与 SQL 中编写的逻辑完全相同的方式实现逻辑可能会很昂贵，因为它会涉及重复的循环。让我们看一个数据框示例：

        Date  ID  Var1
0 2015-01-01   1     0
1 2015-02-01   1     1
2 2015-03-01   1     2
3 2015-04-01   1     3
4 2015-05-01   1     4
5 2015-01-01   2     5
6 2015-02-01   2     6
7 2015-03-01   2     7
8 2015-04-01   2     8
9 2015-05-01   2     9

Run Code Online (Sandbox Code Playgroud)

您可以添加一列，该列按组回顾并汇总固定时间段内的变量。首先使用定义一个函数pd.Series.rolling：

def lookbacker(x):
    """Sum over past 70 days"""
    return x.rolling('70D').sum().astype(int)

Run Code Online (Sandbox Code Playgroud)

然后将其应用于GroupBy对象并提取值进行赋值：

df['Lookback_Sum'] = df.set_index('Date').groupby('ID')['Var1'].apply(lookbacker).values

print(df)

        Date  ID  Var1  Lookback_Sum
0 2015-01-01   1     0             0
1 2015-02-01   1     1             1
2 2015-03-01   1     2             3
3 2015-04-01   1     3             6
4 2015-05-01   1     4             9
5 2015-01-01   2     5             5
6 2015-02-01   2     6            11
7 2015-03-01   2     7            18
8 2015-04-01   2     8            21
9 2015-05-01   2     9            24

Run Code Online (Sandbox Code Playgroud)

它似乎pd.Series.rolling不适用于月份，例如使用'2M'(2 个月) 而不是'70D'(70 天) 给出ValueError: <2 * MonthEnds> is a non-fixed frequency。这是有道理的，因为考虑到月份有不同的天数，“月份”是不明确的。

另一点值得一提的是，您可以直接使用GroupBy+ rolling，并且通过绕过可能会更有效apply，但这需要确保您的索引是单调的。例如，通过sort_index：

df['Lookback_Sum'] = df.set_index('Date').sort_index()\
                       .groupby('ID')['Var1'].rolling('70D').sum()\
                       .astype(int).values

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年前
查看次数：	198 次
最近记录：	7 年前