我有一个带有数字的 df:
numbers = pd.DataFrame(columns=['number'], data=[
50,
65,
75,
85,
90
])
Run Code Online (Sandbox Code Playgroud)
和一个带有范围的 df (查找表):
ranges = pd.DataFrame(
columns=['range','range_min','range_max'],
data=[
['A',90,100],
['B',85,95],
['C',70,80]
]
)
Run Code Online (Sandbox Code Playgroud)
我想确定值(在第一个表中)落在什么范围(在第二个表中)。请注意范围重叠,并且限制包括在内。另请注意,上面的普通数据框有 3 个范围,但该数据框是动态生成的。它可以有 2 到 7 个范围。
期望的结果:
numbers = pd.DataFrame(columns=['number','detected_range'], data=[
[50,'out_of_range'],
[65, 'out_of_range'],
[75,'C'],
[85,'B'],
[90,'overlap'] * could be A or B *
])
Run Code Online (Sandbox Code Playgroud)
我用 for 循环解决了这个问题,但这不能很好地扩展到我正在使用的大数据集。而且代码过于广泛且不优雅。见下文:
numbers['detected_range'] = nan
for i, row1 in number.iterrows():
for j, row2 in ranges.iterrows():
if row1.number<row2.range_min and row1.number>row2.range_max:
numbers.loc[i,'detected_range'] = row1.loc[j,'range']
else if (other cases...):
...and so on...
Run Code Online (Sandbox Code Playgroud)
我怎么能这样做呢?
您可以使用一些numpy向量运算来生成掩码,并将它们用于select标签:
import numpy as np
a = numbers['number'].values # numpy array of numbers
r = ranges.set_index('range') # dataframe of min/max with labels as index
m1 = (a>=r['range_min'].values[:,None]).T # is number above each min
m2 = (a<r['range_max'].values[:,None]).T # is number below each max
m3 = (m1&m2) # combine both conditions above
# NB. the two operations could be done without the intermediate variables m1/m2
m4 = m3.sum(1) # how many matches?
# 0 -> out_of_range
# 2 -> overlap
# 1 -> get column name
# now we select the label according to the conditions
numbers['detected_range'] = np.select([m4==0, m4==2], # out_of_range and overlap
['out_of_range', 'overlap'],
# otherwise get column name
default=np.take(r.index, m3.argmax(1))
)
Run Code Online (Sandbox Code Playgroud)
输出:
number detected_range
0 50 out_of_range
1 65 out_of_range
2 75 C
3 85 B
4 90 overlap
Run Code Online (Sandbox Code Playgroud)
它适用于范围内任意数量的间隔
带有额外内容的示例输出['D',50,51]:
number detected_range
0 50 D
1 65 out_of_range
2 75 C
3 85 B
4 90 overlap
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1113 次 |
| 最近记录: |