熊猫对两个数字之间的列的操作

ana*_*ine 3 python dataframe python-3.x pandas

目前使用 Pandas 和 Numpy。我有一个名为“df”的数据框。假设我有以下数据,如何根据 between 子句为第三列提供值?如果可能的话,我想将其视为一种矢量化方法,以保持我已有的速度。

我尝试过 lambda 函数,但坦率地说,我不明白我在做什么,并且我收到错误,例如对象没有属性“之间”。

一般方法 - 使用非矢量化方法:

NOTE: I am looking for a way to make this vectorised.

If df.['Col2'] is between 0 and 10
   df.['Col 3'] = 1
Elseif df.['Col2'] is between 10.01 and 20
   df.['Col3']  = 2
Else if df.['Col2'] is between 20.1 and 30
   df.['Col3']  = 3
Run Code Online (Sandbox Code Playgroud)

样本集

+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| a    |    5 |    1 |
| b    |   10 |    1 |
| c    |   15 |    2 |
| d    |   20 |    2 |
| e    |   25 |    3 |
| f    |   30 |    3 |
| g    |    1 |    1 |
| h    |   11 |    2 |
| i    |   21 |    3 |
| j    |    7 |    1 |
+------+------+------+


Run Code Online (Sandbox Code Playgroud)

非常感谢

ore*_*pot 5

重用当前代码的解决方案:

def cust_func(row):
    r = row['Col2']
    if  r >=0 AND r<=10:
        val = 1
    elif r >=10.01 AND r<=20:
        val = 2
    elseif r>=20.01 AND r<=30:
        val = 3
    return val

df['Col3'] = df.apply(cust_func, axis=1)
Run Code Online (Sandbox Code Playgroud)

最优解:

cut_labels = [1, 2, 3]
cut_bins = [0, 10, 20,30]
df['Col3'] = pd.cut(df['Col2'], bins=cut_bins, labels=cut_labels)
Run Code Online (Sandbox Code Playgroud)