00_*_*_00 6 python merge pandas
我发现了这个不错的功能 pandas.merge_asof。从文档中
pandas.merge_asof(left, right, on=None, left_on=None, right_on=None)
Parameters:
left : DataFrame
right : DataFrame
on : label
Field name to join on. Must be found in both DataFrames.
The data MUST be ordered.
Furthermore this must be a numeric column,such as datetimelike, integer, or float.
On or left_on/right_on must be given.
Run Code Online (Sandbox Code Playgroud)
并且它按预期工作。
但是,我合并的数据框on仅将原来所在的数据框保留为列left。我需要将它们都保留下来
mydf=pandas.merge_asof(left, right, on='Time')
Run Code Online (Sandbox Code Playgroud)
并mydf同时包含Time从left和right
示例数据:
a=pd.DataFrame(data=pd.date_range('20100201', periods=100, freq='6h3min'),columns=['Time'])
b=pd.DataFrame(data=
pd.date_range('20100201', periods=24, freq='1h'),columns=['Time'])
b['val']=range(b.shape[0])
out=pd.merge_asof(a,b,on='Time',direction='forward',tolerance=pd.Timedelta('30min'))
Run Code Online (Sandbox Code Playgroud)
我认为一种可能的解决方案是重命名列:
out = pd.merge_asof(a.rename(columns={'Time':'Time1'}),
b.rename(columns={'Time':'Time2'}),
left_on='Time1',
right_on='Time2',
direction='forward',
tolerance=pd.Timedelta('30min'))
print (out.head())
Time1 Time2 val
0 2010-02-01 00:00:00 2010-02-01 0.0
1 2010-02-01 06:03:00 NaT NaN
2 2010-02-01 12:06:00 NaT NaN
3 2010-02-01 18:09:00 NaT NaN
4 2010-02-02 00:12:00 NaT NaN
Run Code Online (Sandbox Code Playgroud)