熊猫选择时间序列中间隔超过 1 天的日期

Zan*_*nam 1 python pandas

我有数据框:

df.ix[1:5]
  Date         A  
1 2010-07-26   3.15  
2 2010-07-27   5  
3 2010-07-30   3  
4 2010-07-31   105  
5 2010-08-01   0.05  
6 2010-08-02   0.05  
7 2010-08-05   0.05  
Run Code Online (Sandbox Code Playgroud)

我只想选择连续日期相差超过 2 天的列。即最终结果应该是

  Date         A  
1 2010-07-27   5  
2 2010-07-30   3  
3 2010-08-02   0.05  
4 2010-08-05   0.05   
Run Code Online (Sandbox Code Playgroud)

知道如何进行这项工作吗?

编辑:在结果行开始2010-07-272010-07-30是第一次约会后,2010-07-27这2天以上的间隔。

Shi*_*eng 5

import pandas

df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])

# compute time interval between every row with the last row
time_interval = pandas.Series.to_frame(df['date'] - df['date'].shift(1))
# Give the first time interval a meaningful value
time_interval['date'][0] = pandas.Timedelta('0 days')
# Define the gap
gap = pandas.Timedelta('2 days')

# get the index which satisfies the criteria
result = list(df[time_interval['date'] > gap].index)
new_result = result[:]

# insert its previous index
for i in range(len(result)):
    index = result[i]
    prev_index = index - 1
    if (prev_index >= 0) and (prev_index not in result):
        new_result.insert(new_result.index(index), prev_index)

# get desired rows by the index list
result = df.loc[new_result]
print(result)
Run Code Online (Sandbox Code Playgroud)

输出

        date  value
1 2010-07-27   5.00
2 2010-07-30   3.00
5 2010-08-02   0.05
6 2010-08-05   0.05
Run Code Online (Sandbox Code Playgroud)

更新:

灵感来自斯科特波士顿

import pandas

df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])

index = (df['date'] - df['date'].shift(1)).dt.days > 2

for i in range(len(index)):
    if (i > 0) and index[i]:
        index[i - 1] = True

print(df.loc[index])
Run Code Online (Sandbox Code Playgroud)

再次更新

import pandas

df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])

index = (df['date'] - df['date'].shift(1)).dt.days > 2
index_prev = (df['date'] - df['date'].shift(-1)).dt.days < -2

index = (index | index_prev)

print(df.loc[index])
Run Code Online (Sandbox Code Playgroud)