将查找表应用于数据框中的容器或范围

Kyl*_*yle 3 python python-3.x pandas

我有一个如下所示的DataFrame.假设这些是销售人员列表的销售额.

在此输入图像描述

此外,我有一个查找表,其中包含按金额计算的佣金.这看起来如下.所以,$ 0- $ 50,000 = 5%,$ 50,001- $ 250,000 = 4%等.

在此输入图像描述

我想要做的是将查找表应用于sales表以生成下面的DataFrame.

在此输入图像描述

尝试1:

In [66]: a
Out[66]: 
   Sales_1  Sales_2  Sales_3
0   200000   300000   100000
1   100000   500000   500000
2   400000  1000000   200000

In [67]: b
Out[67]: 
            Commission
Sales                 
50000             0.05
250000            0.04
750000            0.03
9999999999        0.02

In [68]: c = b['Commission'][a <= b.index.values]
Traceback (most recent call last):

  File "<ipython-input-68-d229bce29f01>", line 1, in <module>
    c = b['Commission'][a <= b.index.values]

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\ops.py", line 1184, in f
    res = self._combine_const(other, func, raise_on_error=False)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 3555, in _combine_const
    raise_on_error=raise_on_error)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2911, in eval
    return self.apply('eval', **kwargs)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2890, in apply
    applied = getattr(b, f)(**kwargs)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1132, in eval
    result = get_result(other)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1103, in get_result
    result = func(values, other)

ValueError: operands could not be broadcast together with shapes (3,3) (4,) 
Run Code Online (Sandbox Code Playgroud)

尝试2:

In [59]: a
Out[59]: 
   Sales_1  Sales_2  Sales_3
0   200000   300000   100000
1   100000   500000   500000
2   400000  1000000   200000

In [60]: b
Out[60]: 
            Commission
Sales                 
50000             0.05
250000            0.04
750000            0.03
9999999999        0.02

In [61]: c = b.lookup(a['Sales_1'],['Commission'])
Traceback (most recent call last):

  File "<ipython-input-61-99e8134e826c>", line 1, in <module>
    c = b.lookup(a['Sales_1'],['Commission'])

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 2649, in lookup
    raise ValueError('Row labels must have same size as column labels')

ValueError: Row labels must have same size as column labels
Run Code Online (Sandbox Code Playgroud)

任何人都可以帮我将查找表应用于DataFrame吗?它不一定非常像这样,但这说明了我的一般需求.

Bou*_*oud 8

要与范围合作,pd.cut是你的朋友.根据您当前的b数据帧,您只需修改作为参数传递的bin列表以定义最低范围.在这里,我把0负的销售不存在,但你可以把任何负数太多,如果需要的话,甚至处理-np.infnp.inf代替1E14你的下限和上限:

pd.cut(a.stack(), [0] + b.Sales.tolist(), labels=b.Commission).unstack()
Out[39]: 
  Sales_1 Sales_2 Sales_3
0    0.04    0.03    0.04
1    0.04    0.03    0.03
2    0.03    0.02    0.04
Run Code Online (Sandbox Code Playgroud)

我发现b下面更清楚地用于切割:

          Sales  Commission
0          -inf         NaN
1         50000        0.05
2        250000        0.04
3        750000        0.03
4           inf        0.02
Run Code Online (Sandbox Code Playgroud)

争论成为:

pd.cut(a.stack(), b.Sales, labels=b.Commission[1:]).unstack()
Run Code Online (Sandbox Code Playgroud)