pd.merge_asof 在第二次运行时失败并显示“ValueError: left keys must be sorted”

Kna*_*pie 6 python pandas

嗨,我正在尝试在最匹配的 date_times 上合并两个数据集。

我有两个用于打开和关闭事件的时间戳。

merge_asof 在打开日期运行良好,但在第二个 date_time返回“ValueError: left keys must be sorted”

我在这两种情况下都按相关的 date_time 排序。

第一个数据框:

   idtbl_station_manager     date_time_stamp fld_station_number  \
0                   1121 2017-09-19 15:41:24            AM00571   
1                   1122 2017-09-19 15:41:24            AM00572   
2                   1123 2017-09-19 15:41:24            AM00573   

  fld_grid_number fld_status  fld_station_number_int  \
0     VOY-024-001     CLOSED                     571   
1     VOY-024-002     CLOSED                     572   
2     VOY-024-003     CLOSED                     573   

                  fld_activities date_time_stamp_open fld_lat_open  \
0  Drift Net,CTD-Overside,Dredge  2017-04-13 07:23:35                
1  Drift Net,CTD-Overside,Dredge  2017-04-13 10:15:07   4649.028 S   
2  Drift Net,CTD-Overside,Dredge  2017-04-13 13:15:42   4648.497 S   

  fld_lon_open date_time_stamp_close fld_lat_close fld_lon_close  
0  03759.143 E   2017-04-13 09:51:18    4647.361 S   03759.142 E  
1  03759.143 E   2017-04-13 12:11:00    4647.344 S   03759.143 E  
2                2017-04-13 15:09:26    4647.344 S   03759.143 E  
Run Code Online (Sandbox Code Playgroud)

第二个数据框:

         idtbl_gpgga     date_time_stamp    fld_utc   fld_lat fld_lat_dir  \
1179828      1179829 2017-04-04 02:00:04  000005.00  3354.138           S   
0                  1 2017-04-04 02:00:05  000006.00  3354.138           S   
1                  2 2017-04-04 02:00:07  000008.00  3354.138           S   

          fld_lon fld_lon_dir fld_gps_quality fld_nos fld_hdop fld_alt  \
1179828  1825.557           E               1      10      0.9    21.6   
0        1825.557           E               1      10      0.9    21.6   
1        1825.557           E               1      10      0.9    21.6   

        fld_unit_alt fld_alt_geoid fld_unit_alt_geoid fld_dgps_age fld_dgps_id  
1179828            M          31.9                  M                        0  
0                  M          31.9                  M                        0  
1                  M          31.9                  M                        0  
Run Code Online (Sandbox Code Playgroud)

这按预期工作:

# First we grab the open time lat and lons

# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_open", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)

#merge_asof used to get closest match on datetime
pd_open = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_open'], right_on=['date_time_stamp'], direction="nearest")

pd_open["fld_lat_open"] = pd_open["fld_lat"] + ' ' +  pd_open["fld_lat_dir"]
pd_open["fld_lon_open"] = pd_open["fld_lon"] + ' ' +  pd_open["fld_lon_dir"]     
Run Code Online (Sandbox Code Playgroud)

这失败了:

'ValueError: 左键必须排序'

# Now we grab the close time lat and lons

# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_close", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)

#merge_asof used to get closest match on datetime
pd_close = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_close'], right_on=['date_time_stamp'], direction="nearest")

pd_close["fld_lat_close"] = pd_close["fld_lat"] + ' ' +  pd_close["fld_lat_dir"]
pd_close["fld_lat_close"] = pd_close["fld_lon"] + ' ' +  pd_close["fld_lon_dir"]  
Run Code Online (Sandbox Code Playgroud)

任何建议将不胜感激。

Kna*_*pie 5

正如@JohnE 所指出的,df_stationManager 数据框中存在 NaT 值。

合并前通过清洗解决:

df_stationManager = df_stationManager.dropna() 
Run Code Online (Sandbox Code Playgroud)