以小时和分钟计算两列之间的Pandas DataFrame时差

sba*_*jis 55 python datetime pandas

我在数据框中有两列和迄今为止的列

当我尝试添加新的列差异时,找到两个日期之间的差异使用

df['diff'] = df['todate'] - df['fromdate']
Run Code Online (Sandbox Code Playgroud)

如果超过24小时,我会在几天内得到差异列.

2014-01-24 13:03:12.050000,2014-01-26 23:41:21.870000,"2 days, 10:38:09.820000"
2014-01-27 11:57:18.240000,2014-01-27 15:38:22.540000,03:41:04.300000
2014-01-23 10:07:47.660000,2014-01-23 18:50:41.420000,08:42:53.760000
Run Code Online (Sandbox Code Playgroud)

如何仅在小时和分钟内转换结果,忽略天数甚至秒数.

nit*_*tin 87

Pandas时间戳差异返回datetime.timedelta对象.这可以通过使用*as_type*方法轻松转换为小时,就像这样

import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')
Run Code Online (Sandbox Code Playgroud)

屈服,

0    58
1     3
2     8
dtype: float64
Run Code Online (Sandbox Code Playgroud)

  • timedelta对象具有几天和几秒钟的属性...您可以这样做(df.fr-df.to).dt.days * 24 +(df.fr-df.to).dt.seconds / 3600 (2认同)

elP*_*tor 36

这让我疯狂,因为.astype()上面的解决方案对我不起作用.但我发现了另一种方式.没有时间或其他任何东西,但可能会为其他人工作:

t1 = pd.to_datetime('1/1/2015 01:00')
t2 = pd.to_datetime('1/1/2015 03:30')

print pd.Timedelta(t2 - t1).seconds / 3600.0
Run Code Online (Sandbox Code Playgroud)

......如果你想要几个小时.要么:

print pd.Timedelta(t2 - t1).seconds / 60.0
Run Code Online (Sandbox Code Playgroud)

......如果你想要分钟.

  • 我刚刚发现`.total_seconds()`为那些需要它的人做了工作 (27认同)
  • 我有同样的问题,但是你的解决方案需要小心,因为大于一天的时间差异被忽略,需要单独包含 (8认同)

Tre*_*ney 20

  • 如何将结果转换为仅小时和分钟
    • 接受的答案只返回days + hours分钟不包括在内。
  • 要提供一个小时和分钟为hh:mmor的列,x hours y minutes需要额外的计算和字符串格式。
  • 这个答案显示了如何使用timedelta数学将总小时数或总分钟数作为浮点数,并且比使用更快.astype('timedelta64[h]')
  • Pandas 时间增量用户指南
  • Pandas 时间序列/日期功能用户指南
  • pythontimedelta对象:查看支持的操作。
  • 以下示例数据已经是一个datetime64[ns] dtype. 要求所有相关列都使用pandas.to_datetime().
import pandas as pd

# test data from OP, with values already in a datetime format
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
        'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}

# test dataframe; the columns must be in a datetime format; use pandas.to_datetime if needed
df = pd.DataFrame(data)

# add a timedelta column if wanted. It's added here for information only
# df['time_delta_with_sub'] = df.from_date.sub(df.to_date)  # also works
df['time_delta'] = (df.from_date - df.to_date)

# create a column with timedelta as total hours, as a float type
df['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)

# create a colume with timedelta as total minutes, as a float type
df['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)

# display(df)
                  to_date               from_date             time_delta  tot_hour_diff  tot_mins_diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000      58.636061    3518.163667
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000       3.684528     221.071667
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000       8.714933     522.896000
Run Code Online (Sandbox Code Playgroud)

其他方法

  • 其他资源中播客的一个注意事项.total_seconds()是在核心开发人员休假时添加和合并的,并且不会被批准。
    • 这也是没有其他.total_xx方法的原因。
# convert the entire timedelta to seconds
# this is the same as td / timedelta(seconds=1)
(df.from_date - df.to_date).dt.total_seconds()
[out]:
0    211089.82
1     13264.30
2     31373.76
dtype: float64

# get the number of days
(df.from_date - df.to_date).dt.days
[out]:
0    2
1    0
2    0
dtype: int64

# get the seconds for hours + minutes + seconds, but not days
# note the difference from total_seconds
(df.from_date - df.to_date).dt.seconds
[out]:
0    38289
1    13264
2    31373
dtype: int64
Run Code Online (Sandbox Code Playgroud)

其他资源

%%timeit 测试

import pandas as pd

# dataframe with 2M rows
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]}
df = pd.DataFrame(data)
df = pd.concat([df] * 1000000).reset_index(drop=True)

%%timeit
(df.from_date - df.to_date) / pd.Timedelta(hours=1)
[out]:
43.1 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
(df.from_date - df.to_date).astype('timedelta64[h]')
[out]:
59.8 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Run Code Online (Sandbox Code Playgroud)