我有一个简单的数据帧,如下所示:
p b
0 a buy
1 b buy
2 a sell
3 b sell
Run Code Online (Sandbox Code Playgroud)
和这样的查找表:
p b v
0 a buy 123
1 a sell 456
2 a * 888
4 b * 789
Run Code Online (Sandbox Code Playgroud)
如何合并(加入)两个数据帧,但是尊重列b中的"通配符",即预期的结果是:
p b v
0 a buy 123
1 b buy 789
2 a sell 456
3 b sell 789
Run Code Online (Sandbox Code Playgroud)
我能想到的最好的就是这个,但它非常丑陋且冗长:
data = pd.DataFrame([
['a', 'buy'],
['b', 'buy'],
['a', 'sell'],
['b', 'sell'],
], columns = ['p', 'b'])
lookup = pd.DataFrame([
['a', 'buy', 123],
['a', 'sell', 456],
['a', '*', 888],
['b', '*', 789],
], columns = ['p','b', 'v'])
x = data.reset_index()
y1 = pd.merge(x, lookup, on=['p', 'b'], how='left').set_index('index')
y2 = pd.merge(x[y1['v'].isnull()], lookup, on=['p'], how='left' ).set_index('index')
data['v'] = y1['v'].fillna(y2['v'])
Run Code Online (Sandbox Code Playgroud)
有更聪明的方法吗?
我认为清洁wildcards第一个有点清洁:
In [11]: wildcards = lookup[lookup["b"] == "*"]
In [12]: wildcards.pop("b") # ditch the * column, it'll confuse the later merge
Run Code Online (Sandbox Code Playgroud)
现在,您可以将两个合并(无需set_index)与update:
In [13]: res = df.merge(lookup, how="left")
In [14]: res
Out[14]:
p b v
0 a buy 123.0
1 b buy NaN
2 a sell 456.0
3 b sell NaN
In [15]: res.update(df.merge(wildcards, how="left"), overwrite=False)
In [16]: res
Out[16]:
p b v
0 a buy 123.0
1 b buy 789.0
2 a sell 456.0
3 b sell 789.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
556 次 |
| 最近记录: |