我有数据框:
df.ix[1:5]
Date A
1 2010-07-26 3.15
2 2010-07-27 5
3 2010-07-30 3
4 2010-07-31 105
5 2010-08-01 0.05
6 2010-08-02 0.05
7 2010-08-05 0.05
Run Code Online (Sandbox Code Playgroud)
我只想选择连续日期相差超过 2 天的列。即最终结果应该是
Date A
1 2010-07-27 5
2 2010-07-30 3
3 2010-08-02 0.05
4 2010-08-05 0.05
Run Code Online (Sandbox Code Playgroud)
知道如何进行这项工作吗?
编辑:在结果行开始2010-07-27
的2010-07-30
是第一次约会后,2010-07-27
这2天以上的间隔。
import pandas
df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])
# compute time interval between every row with the last row
time_interval = pandas.Series.to_frame(df['date'] - df['date'].shift(1))
# Give the first time interval a meaningful value
time_interval['date'][0] = pandas.Timedelta('0 days')
# Define the gap
gap = pandas.Timedelta('2 days')
# get the index which satisfies the criteria
result = list(df[time_interval['date'] > gap].index)
new_result = result[:]
# insert its previous index
for i in range(len(result)):
index = result[i]
prev_index = index - 1
if (prev_index >= 0) and (prev_index not in result):
new_result.insert(new_result.index(index), prev_index)
# get desired rows by the index list
result = df.loc[new_result]
print(result)
Run Code Online (Sandbox Code Playgroud)
date value
1 2010-07-27 5.00
2 2010-07-30 3.00
5 2010-08-02 0.05
6 2010-08-05 0.05
Run Code Online (Sandbox Code Playgroud)
灵感来自斯科特波士顿
import pandas
df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])
index = (df['date'] - df['date'].shift(1)).dt.days > 2
for i in range(len(index)):
if (i > 0) and index[i]:
index[i - 1] = True
print(df.loc[index])
Run Code Online (Sandbox Code Playgroud)
import pandas
df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])
index = (df['date'] - df['date'].shift(1)).dt.days > 2
index_prev = (df['date'] - df['date'].shift(-1)).dt.days < -2
index = (index | index_prev)
print(df.loc[index])
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1428 次 |
最近记录: |