熊猫:如何根据现有列的多个条件分配值?

Eri*_*ric 11 python pandas

我想基于以下条件创建一个具有数值的新列:

一个.如果性别是男性&pet1 = pet2,则分数= 5

湾 如果性别是女性&(pet1是'cat'或pet1 ='dog'),则积分= 5

C.所有其他组合,points = 0

    gender    pet1      pet2
0   male      dog       dog
1   male      cat       cat
2   male      dog       cat
3   female    cat       squirrel
4   female    dog       dog
5   female    squirrel  cat
6   squirrel  dog       cat
Run Code Online (Sandbox Code Playgroud)

我希望最终结果如下:

    gender    pet1      pet2      points
0   male      dog       dog       5
1   male      cat       cat       5
2   male      dog       cat       0
3   female    cat       squirrel  5
4   female    dog       dog       5
5   female    squirrel  cat       0
6   squirrel  dog       cat       0
Run Code Online (Sandbox Code Playgroud)

我该如何做到这一点?

Erf*_*fan 31

numpy.select

2020 答案

这是一个完美的例子np.select,我们可以根据多个条件创建一个列,当有更多条件时,这是一种可读的方法:

conditions = [
    df['gender'].eq('male') & df['pet1'].eq(df['pet2']),
    df['gender'].eq('female') & df['pet1'].isin(['cat', 'dog'])
]

choices = [5,5]

df['points'] = np.select(conditions, choices, default=0)

print(df)
     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0
Run Code Online (Sandbox Code Playgroud)

  • 请注意,如果您要操作现有列,并且想要将现有列值保留为默认值,则可以执行以下操作: `df['points'] = np.select(conditions, Choices, default=df['points'] )`更多信息在这里:/sf/answers/4564965191/ (3认同)
  • 这既漂亮又快速!它轻松击败了大型数据帧的 .apply 方法。谢谢尔凡, (2认同)

EdC*_*ica 17

您可以使用np.where,由于运算符优先级,条件使用按位&|for andor括号围绕多个条件.因此5返回条件为真的情况,0否则:

In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df

Out[29]:
     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0
Run Code Online (Sandbox Code Playgroud)


Rug*_*rra 10

使用申请.

def f(x):
  if x['gender'] == 'male' and x['pet1'] == x['pet2']: return 5
  elif x['gender'] == 'female' and (x['pet1'] == 'cat' or x['pet1'] == 'dog'): return 5
  else: return 0

data['points'] = data.apply(f, axis=1)
Run Code Online (Sandbox Code Playgroud)


leo*_*ard 5

@RuggeroTurra 描述的 apply 方法对于 500k 行需要更长的时间。我最终使用了类似的东西

df['result'] = ((df.a == 0) & (df.b != 1)).astype(int) * 2 + \
               ((df.a != 0) & (df.b != 1)).astype(int) * 3 + \
               ((df.a == 0) & (df.b == 1)).astype(int) * 4 + \
               ((df.a != 0) & (df.b == 1)).astype(int) * 5 
Run Code Online (Sandbox Code Playgroud)

其中 apply 方法需要 25 秒,上面的方法需要大约 18 毫秒。


Geo*_*pis 5

您也可以使用该apply功能。例如:

def myfunc(gender, pet1, pet2):
    if gender=='male' and pet1==pet2:
        myvalue=5
    elif gender=='female' and (pet1=='cat' or pet1=='dog'):
        myvalue=5
    else:
        myvalue=0
    return myvalue
Run Code Online (Sandbox Code Playgroud)

然后通过设置使用应用功能 axis=1

df['points'] = df.apply(lambda x: myfunc(x['gender'], x['pet1'], x['pet2']), axis=1)
Run Code Online (Sandbox Code Playgroud)

我们得到:

     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0
Run Code Online (Sandbox Code Playgroud)