如何对熊猫中的时间序列数据进行下采样？

Question

如何对熊猫中的时间序列数据进行下采样？

我在 Pandas 中有一个时间序列，看起来像这样（按 id 排序）：

id    time    value
 1       0        2
 1       1        4
 1       2        5
 1       3       10
 1       4       15
 1       5       16
 1       6       18
 1       7       20
 2      15        3
 2      16        5
 2      17        8
 2      18       10
 4       6        5
 4       7        6

Run Code Online (Sandbox Code Playgroud)

我希望每个组 id 的下采样时间从 1 分钟减少到 3 分钟。值是组的最大值（id 和 3 分钟）。

输出应该是这样的：

id    time    value
 1       0        5
 1       1       16
 1       2       20
 2       0        8
 2       1       10
 4       0        6

Run Code Online (Sandbox Code Playgroud)

我试过循环它需要很长时间的过程。

知道如何为大型数据框解决这个问题吗？

谢谢！

Answer 1

use*_*203 6

您可以将time系列转换为 actual timedelta，然后resample用于矢量化解决方案：

t = pd.to_timedelta(df.time, unit='T')
s = df.set_index(t).groupby('id').resample('3T').last().reset_index(drop=True)
s.assign(time=s.groupby('id').cumcount())

Run Code Online (Sandbox Code Playgroud)

   id  time  value
0   1     0      5
1   1     1     16
2   1     2     20
3   2     0      8
4   2     1     10
5   4     0      6

Run Code Online (Sandbox Code Playgroud)

Answer 2

Sco*_*ton 4

使用np.r_和：.ilocgroupby

df.groupby('id')['value'].apply(lambda x: x.iloc[np.r_[2:len(x):3,-1]])

Run Code Online (Sandbox Code Playgroud)

输出：

id    
1   2      5
    5     16
    7     20
2   10     8
    11    10
4   13     6
Name: value, dtype: int64

Run Code Online (Sandbox Code Playgroud)

进一步了解列命名等。

df_out = df.groupby('id')['value']\
           .apply(lambda x: x.iloc[np.r_[2:len(x):3,-1]]).reset_index()
df_out.assign(time=df_out.groupby('id').cumcount()).drop('level_1', axis=1)

Run Code Online (Sandbox Code Playgroud)

输出：

   id  value  time
0   1      5     0
1   1     16     1
2   1     20     2
3   2      8     0
4   2     10     1
5   4      6     0

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，2 月前
查看次数：	4040 次
最近记录：	7 年，2 月前