熊猫如何在时间段上过滤DataFrame

Question

熊猫如何在时间段上过滤DataFrame

Bar*_*ysz 2 python datetime python-3.x python-datetime pandas

我有一个带有下表的文件：

    Name        AvailableDate            totalRemaining
0   X3321       2018-03-14 13:00:00      200
1   X3321       2018-03-14 14:00:00      200
2   X3321       2018-03-14 15:00:00      200
3   X3321       2018-03-14 16:00:00      200
4   X3321       2018-03-14 17:00:00      193

Run Code Online (Sandbox Code Playgroud)

我想返回一个DataFrame，其中包含特定时间段内的所有记录，而与实际日期无关。

我在这里遵循了示例：

按时间过滤熊猫数据框

但是当我执行以下命令时：

## setup
import pandas as pd
import numpy as np

### Step 2
### Check available slots
file2 = r'C:\Users\user\Desktop\Files\data.xlsx'

slots = pd.read_excel(file2,na_values='')

## filter the preferred ones
slots['nextAvailableDate'] = pd.to_datetime((slots['nextAvailableDate']))


slots['times'] = pd.to_datetime((slots['nextAvailableDate']))
slots = slots[slots['times'].between('21:00:00', '02:00:00')]

Run Code Online (Sandbox Code Playgroud)

这将返回空的DataFrame以及以下解决方案：

slots = slots[slots['times'].dt.strftime('%H:%M:%S').between('21:00:00', '02:00:00')]

Run Code Online (Sandbox Code Playgroud)

有没有一种方法可以正确地做到这一点而无需分别创建时间栏？请问我应该如何解决这个问题？

我的目标：

Name        AvailableDate            totalRemaining
0   X3321       2018-03-14 21:00:00      200
1   X3321       2018-03-14 22:00:00      200
2   X3321       2018-03-14 23:00:00      200
3   X3321       2018-03-14 00:00:00      200
4   X3321       2018-03-14 01:00:00      193

Run Code Online (Sandbox Code Playgroud)

出现在数据集中的每一天。

Answer 1

jez*_*ael 5

我认为需要between_time与Datetimeindex创建者合作set_index，为列添加reset_index具有reindex相同顺序的列：

print (slots)
     Name        AvailableDate  totalRemaining
0   X3321  2018-03-14 21:00:00             200
1   X3321  2018-03-14 20:00:00             200
2   X3321  2018-03-14 22:00:00             200
3   X3321  2018-03-14 23:00:00             200
4   X3321  2018-03-14 00:00:00             200
5   X3321  2018-03-14 01:00:00             193
6   X3321  2018-03-14 13:00:00             200
7   X3321  2018-03-14 14:00:00             200
8   X3321  2018-03-14 15:00:00             200
9   X3321  2018-03-14 16:00:00             200
10  X3321  2018-03-14 17:00:00             193

slots['AvailableDate'] = pd.to_datetime(slots['AvailableDate'])

df = (slots.set_index('AvailableDate')
          .between_time('21:00:00', '02:00:00')
          .reset_index()
          .reindex(columns=df.columns))
print (df)
        AvailableDate   Name  totalRemaining
0 2018-03-14 21:00:00  X3321             200
1 2018-03-14 22:00:00  X3321             200
2 2018-03-14 23:00:00  X3321             200
3 2018-03-14 00:00:00  X3321             200
4 2018-03-14 01:00:00  X3321             193

Run Code Online (Sandbox Code Playgroud)

Answer 2

jpp*_*jpp 5

您可以pd.Series.between与datetime对象一起使用，如下所示。

from datetime import datetime

start = datetime.strptime('21:00:00', '%H:%M:%S').time()
end = datetime.strptime('02:00:00', '%H:%M:%S').time()

slots = slots[slots['times'].dt.time.between(start, end)]

Run Code Online (Sandbox Code Playgroud)

用法示例

from datetime import datetime
import pandas as pd

slots = pd.DataFrame({'times': ['2018-03-08 05:00:00', '2018-03-08 07:00:00',
                                '2018-03-08 01:00:00', '2018-03-08 20:00:00',
                                '2018-03-08 22:00:00', '2018-03-08 23:00:00']})


slots['times'] = pd.to_datetime(slots['times'])

start = datetime.strptime('21:00:00', '%H:%M:%S').time()
end = datetime.strptime('23:30:00', '%H:%M:%S').time()

slots = slots[slots['times'].dt.time.between(start, end)]

#                 times
# 4 2018-03-08 22:00:00
# 5 2018-03-08 23:00:00

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，6 月前
查看次数：	2412 次
最近记录：	5 年，12 月前