pandas 选择具有给定时间戳间隔的行

Question

pandas 选择具有给定时间戳间隔的行

我有一个以下形式的大型数据框

timestamp | col1 | col2 ...

Run Code Online (Sandbox Code Playgroud)

我想选择间隔至少 x 分钟的行，其中 x 可以是 5,10,30 等。问题是时间戳不是等距的，所以我不能做一个简单的“获取每 n 行”技巧。

例子：

timestamp | col1 | col2

'2019-01-15 17:52:29.955000', x, b
'2019-01-15 17:58:29.531000', x, b
'2019-01-16 03:21:48.255000', x, b
'2019-01-16 03:27:46.324000', x, b
'2019-01-16 03:33:09.984000', x, b
'2019-01-16 07:22:08.170000', x, b
'2019-01-16 07:28:27.406000', x, b
'2019-01-16 07:34:35.194000', x, b

Run Code Online (Sandbox Code Playgroud)

如果间隔 = 10：

结果：

'2019-01-15 17:52:29.955000', x, b
'2019-01-16 03:21:48.255000', x, b
'2019-01-16 03:33:09.984000', x, b
'2019-01-16 07:22:08.170000', x, b
'2019-01-16 07:34:35.194000', x, b

Run Code Online (Sandbox Code Playgroud)

如果间隔 = 30：

结果：

'2019-01-15 17:52:29.955000', x, b
'2019-01-16 03:21:48.255000', x, b
'2019-01-16 07:22:08.170000', x, b

Run Code Online (Sandbox Code Playgroud)

我可以采用强力 n^2 方法，但我确信有一种我缺少的 pandas 方法。

谢谢你！:)

编辑：它不是计算 Pandas Dataframe 索引之间的时间差的重复，只是为了澄清。我需要根据给定的间隔对数据帧进行子集化

Answer 1

Qua*_*ang 5

就像评论的那样，看起来你需要做一个for循环。这还不错，因为你正在做一个O(n)循环：

def sampling(df, thresh):
    thresh = pd.to_timedelta(thresh)
    time_diff = df.timestamp.diff().fillna(pd.Timedelta(seconds=0))
    ret = [0]
    running_total = pd.to_timedelta(0)
    for i in df.index:
        running_total += time_diff[i]
        if running_total >= thresh:
            ret.append(i)
            running_total = pd.to_timedelta(0)

    return df.loc[ret].copy()

Run Code Online (Sandbox Code Playgroud)

然后sampling(df, '10T')给出

                timestamp col1 col2
0 2019-01-15 17:52:29.955    x    b
2 2019-01-16 03:21:48.255    x    b
4 2019-01-16 03:33:09.984    x    b
5 2019-01-16 07:22:08.170    x    b
7 2019-01-16 07:34:35.194    x    b

Run Code Online (Sandbox Code Playgroud)

并sampling(df, '30T')给出：

                timestamp col1 col2
0 2019-01-15 17:52:29.955    x    b
2 2019-01-16 03:21:48.255    x    b
5 2019-01-16 07:22:08.170    x    b

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，7 月前
查看次数：	1003 次
最近记录：	6 年，7 月前