在范围上合并pandas数据帧的最快方法

Joh*_*ine 6 python numpy dataframe pandas

我有一个 dataframe A

    ip_address
0   13
1   5
2   20
3   11
.. ........
Run Code Online (Sandbox Code Playgroud)

和另一个 dataframe B

    lowerbound_ip_address   upperbound_ip_address           country
0    0                       10                             Australia
1    11                      20                             China
Run Code Online (Sandbox Code Playgroud)

在此基础上,我需要在增加一列A,使得

ip_address  country
13          China
5           Australia
Run Code Online (Sandbox Code Playgroud)

我有一个想法,我应该编写定义一个函数,然后在A的每一行调用map.但是我如何搜索B的每一行.有一个更好的方法吗.

Zer*_*ero 12

使用 pd.IntervalIndex

In [2503]: s = pd.IntervalIndex.from_arrays(dfb.lowerbound_ip_address,
                                            dfb.upperbound_ip_address, 'both')

In [2504]: dfa.assign(country=dfb.set_index(s).loc[dfa.ip_address].country.values)
Out[2504]:
   ip_address    country
0          13      China
1           5  Australia
2          20      China
3          11      China
Run Code Online (Sandbox Code Playgroud)

细节

In [2505]: s
Out[2505]:
IntervalIndex([[0, 10], [11, 20]]
              closed='both',
              dtype='interval[int64]')

In [2507]: dfb.set_index(s)
Out[2507]:
          lowerbound_ip_address  upperbound_ip_address    country
[0, 10]                       0                     10  Australia
[11, 20]                     11                     20      China

In [2506]: dfb.set_index(s).loc[dfa.ip_address]
Out[2506]:
          lowerbound_ip_address  upperbound_ip_address    country
[11, 20]                     11                     20      China
[0, 10]                       0                     10  Australia
[11, 20]                     11                     20      China
[11, 20]                     11                     20      China
Run Code Online (Sandbox Code Playgroud)

建立

In [2508]: dfa
Out[2508]:
   ip_address
0          13
1           5
2          20
3          11

In [2509]: dfb
Out[2509]:
   lowerbound_ip_address  upperbound_ip_address    country
0                      0                     10  Australia
1                     11                     20      China
Run Code Online (Sandbox Code Playgroud)


WeN*_*Ben 5

尝试pd.merge_asof

df['lowerbound_ip_address']=df['ip_address']
pd.merge_asof(df1,df,on='lowerbound_ip_address',direction ='forward',allow_exact_matches =False)
Out[811]: 
   lowerbound_ip_address  upperbound_ip_address    country  ip_address
0                      0                     10  Australia           5
1                     11                     20      China          13
Run Code Online (Sandbox Code Playgroud)