我有一个大型数据集,其中包含超过500 000个日期和时间戳,如下所示:
date time
2017-06-25 00:31:53.993
2017-06-25 00:32:31.224
2017-06-25 00:33:11.223
2017-06-25 00:33:53.876
2017-06-25 00:34:31.219
2017-06-25 00:35:12.634
Run Code Online (Sandbox Code Playgroud)
如何将这些时间戳取整到最接近的秒数?
我的代码如下所示:
readcsv = pd.read_csv(filename)
log_date = readcsv.date
log_time = readcsv.time
readcsv['date'] = pd.to_datetime(readcsv['date']).dt.date
readcsv['time'] = pd.to_datetime(readcsv['time']).dt.time
timestamp = [datetime.datetime.combine(log_date[i],log_time[i]) for i in range(len(log_date))]
Run Code Online (Sandbox Code Playgroud)
所以现在我将日期和时间组合成一个datetime.datetime看起来像这样的对象列表:
datetime.datetime(2017,6,25,00,31,53,993000)
datetime.datetime(2017,6,25,00,32,31,224000)
datetime.datetime(2017,6,25,00,33,11,223000)
datetime.datetime(2017,6,25,00,33,53,876000)
datetime.datetime(2017,6,25,00,34,31,219000)
datetime.datetime(2017,6,25,00,35,12,634000)
Run Code Online (Sandbox Code Playgroud)
我从这里去哪里?该df.timestamp.dt.round('1s')功能似乎不起作用?另外在使用时,.split()我遇到了问题,即秒和分钟超过59
非常感谢
我正在努力使用熊猫来完善时间戳。
时间戳如下所示:
datetime.datetime(2017,06,25,00,31,53,993000)
datetime.datetime(2017,06,25,00,32,31,224000)
datetime.datetime(2017,06,25,00,33,11,223000)
datetime.datetime(2017,06,25,00,33,53,876000)
datetime.datetime(2017,06,25,00,34,31,219000)
datetime.datetime(2017,06,25,00,35,12,634000)
Run Code Online (Sandbox Code Playgroud)
如何四舍五入到最接近的秒数?
以前的iv尝试了这篇文章中的建议,但没有用:将 时间舍入到最接近的秒-Python
到目前为止,我的代码如下:
import pandas as pd
filename = 'data.csv'
readcsv = pd.read_csv(filename)
Run Code Online (Sandbox Code Playgroud)
根据文件头信息导入数据
log_date = readcsv.date
log_time = readcsv.time
log_lon = readcsv.lon
log_lat = readcsv.lat
log_heading = readcsv.heading
readcsv['date'] = pd.to_datetime(readcsv['date']).dt.date
readcsv['time'] = pd.to_datetime(readcsv['time']).dt.time
Run Code Online (Sandbox Code Playgroud)
将日期和时间合并为一个变量
timestamp = [datetime.datetime.combine(log_date[i],log_time[i]) for i in range(len(log_date))]
Run Code Online (Sandbox Code Playgroud)
创建数据框
data = {'timestamp':timestamp,'log_lon':log_lon,'log_lat':log_lat,'log_heading':log_heading}
log_data = pd.DataFrame(data,columns=['timestamp','log_lon','log_lat','log_heading'])
log_data.index = log_data['timestamp']
Run Code Online (Sandbox Code Playgroud)
我对python还是很陌生,所以请原谅我的无知
我有 5 个带有系泊电流计数据的 netCDF 文件。每个文件看起来像这样:
<xarray.Dataset>
Dimensions: (BINDEPTH: 50, INSTRDEPTH: 3, LATITUDE: 5, LONGITUDE: 5, TIME: 44106)
Coordinates:
* INSTRDEPTH (INSTRDEPTH) float64 100.0 280.0 600.0
* LATITUDE (LATITUDE) float64 -34.04 -33.8 -33.67 -33.56 -33.51
* LONGITUDE (LONGITUDE) float64 27.57 27.59 27.64 27.72 27.86
* TIME (TIME) datetime64[ns] 2015-04-11T15:00:00 ...
Dimensions without coordinates: BINDEPTH
Data variables:
PRES (TIME, INSTRDEPTH) float32 dask.array<shape=(44106, 3), chunksize=(44106, 3)>
VCUR (TIME, BINDEPTH) float32 dask.array<shape=(44106, 50), chunksize=(44106, 50)>
UCUR (TIME, BINDEPTH) float32 dask.array<shape=(44106, 50), chunksize=(44106, 50)>
WCUR …Run Code Online (Sandbox Code Playgroud)