sg9*_*g91 6 python python-2.7 pandas
我有一个看起来像这样的 Pandas DF:
我想使用本地定义的 int 参数“days”过滤 DF。例如,当天数 = 10 时,我过滤后的 DF 仅包含最近 10 个可用日期的数据。
到目前为止,我已经尝试了以下方法:
days=10
cutoff_date = df["SeriesDate"][-1:] - datetime.timedelta(days=days)
Run Code Online (Sandbox Code Playgroud)
但是,然后尝试使用以下方法输出过滤后的 DF:
df[df['SeriesDate'] > cutoff_date]
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
ValueError: Can only compare identically-labeled Series objects
Run Code Online (Sandbox Code Playgroud)
我仍在学习 Python,因此将感谢我能获得的任何帮助。
我认为您需要选择列的最后一个SeriesDate值iloc:
start = pd.to_datetime('2015-02-24')
rng = pd.date_range(start, periods=15, freq='20H')
df = pd.DataFrame({'SeriesDate': rng, 'Value_1': np.random.random(15)})
print (df)
SeriesDate Value_1
0 2015-02-24 00:00:00 0.849160
1 2015-02-24 20:00:00 0.332487
2 2015-02-25 16:00:00 0.687638
3 2015-02-26 12:00:00 0.310326
4 2015-02-27 08:00:00 0.660795
5 2015-02-28 04:00:00 0.354475
6 2015-03-01 00:00:00 0.061312
7 2015-03-01 20:00:00 0.443908
8 2015-03-02 16:00:00 0.708326
9 2015-03-03 12:00:00 0.257419
10 2015-03-04 08:00:00 0.618363
11 2015-03-05 04:00:00 0.121625
12 2015-03-06 00:00:00 0.637324
13 2015-03-06 20:00:00 0.058292
14 2015-03-07 16:00:00 0.047624
Run Code Online (Sandbox Code Playgroud)
days=10
cutoff_date = df["SeriesDate"].iloc[-1] - pd.Timedelta(days=days)
print (cutoff_date)
2015-02-25 16:00:00
df1 = df[df['SeriesDate'] > cutoff_date]
print (df1)
SeriesDate Value_1
3 2015-02-26 12:00:00 0.310326
4 2015-02-27 08:00:00 0.660795
5 2015-02-28 04:00:00 0.354475
6 2015-03-01 00:00:00 0.061312
7 2015-03-01 20:00:00 0.443908
8 2015-03-02 16:00:00 0.708326
9 2015-03-03 12:00:00 0.257419
10 2015-03-04 08:00:00 0.618363
11 2015-03-05 04:00:00 0.121625
12 2015-03-06 00:00:00 0.637324
13 2015-03-06 20:00:00 0.058292
14 2015-03-07 16:00:00 0.047624
Run Code Online (Sandbox Code Playgroud)
另一种选择是使用max,谢谢Pocin:
cutoff_date = df["SeriesDate"].max() - pd.Timedelta(days=days)
print (cutoff_date)
2015-02-25 16:00:00
Run Code Online (Sandbox Code Playgroud)
如果您只想过滤dates:
days=10
cutoff_date = df["SeriesDate"].dt.date.iloc[-1] - pd.Timedelta(days=days)
print (cutoff_date)
2015-02-25
Run Code Online (Sandbox Code Playgroud)
编辑:
start = pd.to_datetime('2015-02-24')
rng = pd.date_range(start, periods=15)
df = pd.DataFrame({'SeriesDate': rng, 'Value_1': np.random.random(15)})
print (df)
SeriesDate Value_1
0 2015-02-24 0.498387
1 2015-02-25 0.435767
2 2015-02-26 0.299233
3 2015-02-27 0.489286
4 2015-02-28 0.892167
5 2015-03-01 0.507436
6 2015-03-02 0.360427
7 2015-03-03 0.903886
8 2015-03-04 0.718148
9 2015-03-05 0.645489
10 2015-03-06 0.251285
11 2015-03-07 0.139275
12 2015-03-08 0.756845
13 2015-03-09 0.565863
14 2015-03-10 0.148077
Run Code Online (Sandbox Code Playgroud)
days=10
last_day = df["SeriesDate"].dt.date.iloc[-1]
cutoff_date = last_day - pd.Timedelta(days=days)
rng = pd.date_range(cutoff_date, last_day)
rng = rng[(rng.dayofweek != 0) & (rng.dayofweek != 6)]
print (rng)
DatetimeIndex(['2015-02-28', '2015-03-03', '2015-03-04', '2015-03-05',
'2015-03-06', '2015-03-07', '2015-03-10'],
dtype='datetime64[ns]', freq=None)
df1 = df[df['SeriesDate'].isin(rng)]
print (df1)
SeriesDate Value_1
4 2015-02-28 0.892167
7 2015-03-03 0.903886
8 2015-03-04 0.718148
9 2015-03-05 0.645489
10 2015-03-06 0.251285
11 2015-03-07 0.139275
14 2015-03-10 0.148077
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
7321 次 |
| 最近记录: |