使用日期时间索引过滤熊猫

ste*_*enb 6 python pandas

使用 Pandas 数据帧的日期时间索引,很容易获得一系列日期:

df[datetime(2018,1,1):datetime(2018,1,10)]
Run Code Online (Sandbox Code Playgroud)

过滤也很简单:

df[ (df['column A'] = 'Done') & (df['column B'] < 3.14 )]
Run Code Online (Sandbox Code Playgroud)

但是,同时按日期范围和任何其他非日期标准进行过滤的最佳方法是什么?

piR*_*red 10

3 个布尔条件

c0 = df.index.to_series().between('2018-01-01', '2018-01-10')
c1 = df['column A'] == 'Done'
c2 = df['column B'] < 3.14

df[c0 & c1 & c2]

           column A  column B
2018-01-04     Done  2.533385
2018-01-06     Done  2.789072
2018-01-08     Done  2.230017
Run Code Online (Sandbox Code Playgroud)

设置

np.random.seed([3, 1415])
df = pd.DataFrame({
    'column A': ['Done', 'Not Done'] * 10,
    'column B': np.random.randn(20) + np.pi
}, pd.date_range('2017-12-25', periods=20))

df

            column A  column B
2017-12-25      Done  1.011868
2017-12-26  Not Done  1.873127
2017-12-27      Done  1.171093
2017-12-28  Not Done  0.882538
2017-12-29      Done  2.792306
2017-12-30  Not Done  3.114638
2017-12-31      Done  3.457829
2018-01-01  Not Done  3.490375
2018-01-02      Done  3.856957
2018-01-03  Not Done  3.912356
2018-01-04      Done  2.533385
2018-01-05  Not Done  3.493983
2018-01-06      Done  2.789072
2018-01-07  Not Done  2.725724
2018-01-08      Done  2.230017
2018-01-09  Not Done  2.999055
2018-01-10      Done  3.888432
2018-01-11  Not Done  1.637436
2018-01-12      Done  3.752955
2018-01-13  Not Done  3.541812
Run Code Online (Sandbox Code Playgroud)


jez*_*ael 8

如果有多个布尔掩码可以使用np.logical_and.reduce

m1 = df.index > '2018-01-01'
m2 = df.index < '2018-01-10'
m3 = df['column A'] == 'Done'
m4 = df['column B'] < 3.14

#piRSquared's data sample
df = df[np.logical_and.reduce([m1, m2, m3, m4])]
print (df)
           column A  column B
2018-01-04     Done  2.533385
2018-01-06     Done  2.789072
2018-01-08     Done  2.230017
Run Code Online (Sandbox Code Playgroud)


小智 7

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2018-1-1', periods=200, freq='D')
df = df.set_index(['date'])
print(df.loc['2018-2-1':'2018-2-10'])
Run Code Online (Sandbox Code Playgroud)

希望!这会有帮助的