4 python datetime dataframe pandas
我有以下日期并且尝试了以下代码,
df['start_date_time'] = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:45:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"]
df['start_date_time'] = pd.to_datetime([df['start_date_time']).replace(second = 0)
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
TypeError: replace() got an unexpected keyword argument 'second'
Run Code Online (Sandbox Code Playgroud)
jez*_*ael 20
如果需要输出中的日期时间的解决方案:
df = pd.DataFrame({'start_date_time': ["2016-05-19 08:25:23","2016-05-19 16:00:45"]})
df['start_date_time'] = pd.to_datetime(df['start_date_time'])
print (df)
start_date_time
0 2016-05-19 08:25:23
1 2016-05-19 16:00:45
Run Code Online (Sandbox Code Playgroud)
Series.dt.floor按分钟使用T或Min:
df['start_date_time'] = df['start_date_time'].dt.floor('T')
df['start_date_time'] = df['start_date_time'].dt.floor('Min')
Run Code Online (Sandbox Code Playgroud)
您可以先使用 convert tonumpy values然后seconds通过 cast to截断<M8[m],但此解决方案删除了可能的时区:
df['start_date_time'] = df['start_date_time'].values.astype('<M8[m]')
print (df)
start_date_time
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00
Run Code Online (Sandbox Code Playgroud)
另一种解决方案是timedelta从创建系列second并减去:
print (pd.to_timedelta(df['start_date_time'].dt.second, unit='s'))
0 00:00:23
1 00:00:45
Name: start_date_time, dtype: timedelta64[ns]
df['start_date_time'] = df['start_date_time'] -
pd.to_timedelta(df['start_date_time'].dt.second, unit='s')
print (df)
start_date_time
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00
Run Code Online (Sandbox Code Playgroud)
时间:
df = pd.DataFrame({'start_date_time': ["2016-05-19 08:25:23","2016-05-19 16:00:45"]})
df['start_date_time'] = pd.to_datetime(df['start_date_time'])
#20000 rows
df = pd.concat([df]*10000).reset_index(drop=True)
In [28]: %timeit df['start_date_time'] = df['start_date_time'] - pd.to_timedelta(df['start_date_time'].dt.second, unit='s')
4.05 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [29]: %timeit df['start_date_time1'] = df['start_date_time'].values.astype('<M8[m]')
1.73 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [30]: %timeit df['start_date_time'] = df['start_date_time'].dt.floor('T')
1.07 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [31]: %timeit df['start_date_time2'] = df['start_date_time'].apply(lambda t: t.replace(second=0))
183 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Run Code Online (Sandbox Code Playgroud)
如果需要输出中日期时间的字符串表示的解决方案
print(df['start_date_time'].dt.strftime('%Y-%m-%d %H:%M'))
0 2016-05-19 08:25
1 2016-05-19 16:00
Name: start_date_time, dtype: object
Run Code Online (Sandbox Code Playgroud)
如有必要,设置:00为秒:
print(df['start_date_time'].dt.strftime('%Y-%m-%d %H:%M:00'))
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00
Name: start_date_time, dtype: object
Run Code Online (Sandbox Code Playgroud)
pd.to_datetime将返回datetime具有secondas属性的对象:您对此无能为力。您可以设置second为0,但是属性仍将在此处,并且标准表示形式仍将包含尾随':00'。
您需要replace在以下每个元素上应用df:
import pandas as pd
df = pd.DataFrame({'start_date_time': ["2016-05-19 08:25:23","2016-05-19 16:00:45","2016-05-20 07:45:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"]})
df['start_date_time'] = pd.to_datetime(df['start_date_time'])
df['start_date_time'] = df['start_date_time'].apply(lambda t: t.replace(second=0))
print(df)
# start_date_time
# 0 2016-05-19 08:25:00
# 1 2016-05-19 16:00:00
# 2 2016-05-20 07:45:00
# 3 2016-05-24 12:50:00
# 4 2016-05-25 23:00:00
# 5 2016-05-26 19:45:00
Run Code Online (Sandbox Code Playgroud)
:23并且:45从第一次被替换为:00,但它们仍然被打印。
':00'从琴弦上移开如果您只想使用这些时间的字符串表示形式,并且仅将字符串解析为datetime对象以便':00'在字符串末尾删除,则可以删除最后3个字符:
>>> "2016-05-19 08:25:00"[:-3]
'2016-05-19 08:25'
Run Code Online (Sandbox Code Playgroud)
您可以在初始化之前将其应用于列表中的每个元素df['start_date_time']:
>>> start_date_time = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:45:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"]
>>> map(lambda s: s[:-3], start_date_time)
['2016-05-19 08:25', '2016-05-19 16:00', '2016-05-20 07:45', '2016-05-24 12:50', '2016-05-25 23:00', '2016-05-26 19:45']
Run Code Online (Sandbox Code Playgroud)
如果您想使用datetime对象但不想显示秒数:
print(df['start_date_time'].apply(lambda t: t.strftime('%Y-%m-%d %H:%M')))
# 0 2016-05-19 08:25
# 1 2016-05-19 16:00
# 2 2016-05-20 07:45
# 3 2016-05-24 12:50
# 4 2016-05-25 23:00
# 5 2016-05-26 19:45
# Name: start_date_time, dtype: object
Run Code Online (Sandbox Code Playgroud)