我想基于以下条件创建一个具有数值的新列:
一个.如果性别是男性&pet1 = pet2,则分数= 5
湾 如果性别是女性&(pet1是'cat'或pet1 ='dog'),则积分= 5
C.所有其他组合,points = 0
gender pet1 pet2
0 male dog dog
1 male cat cat
2 male dog cat
3 female cat squirrel
4 female dog dog
5 female squirrel cat
6 squirrel dog cat
Run Code Online (Sandbox Code Playgroud)
我希望最终结果如下:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
Run Code Online (Sandbox Code Playgroud)
我该如何做到这一点?
Erf*_*fan 31
numpy.select2020 答案
这是一个完美的例子np.select,我们可以根据多个条件创建一个列,当有更多条件时,这是一种可读的方法:
conditions = [
df['gender'].eq('male') & df['pet1'].eq(df['pet2']),
df['gender'].eq('female') & df['pet1'].isin(['cat', 'dog'])
]
choices = [5,5]
df['points'] = np.select(conditions, choices, default=0)
print(df)
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
Run Code Online (Sandbox Code Playgroud)
EdC*_*ica 17
您可以使用np.where,由于运算符优先级,条件使用按位&和|for and和or括号围绕多个条件.因此5返回条件为真的情况,0否则:
In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df
Out[29]:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
Run Code Online (Sandbox Code Playgroud)
Rug*_*rra 10
使用申请.
def f(x):
if x['gender'] == 'male' and x['pet1'] == x['pet2']: return 5
elif x['gender'] == 'female' and (x['pet1'] == 'cat' or x['pet1'] == 'dog'): return 5
else: return 0
data['points'] = data.apply(f, axis=1)
Run Code Online (Sandbox Code Playgroud)
@RuggeroTurra 描述的 apply 方法对于 500k 行需要更长的时间。我最终使用了类似的东西
df['result'] = ((df.a == 0) & (df.b != 1)).astype(int) * 2 + \
((df.a != 0) & (df.b != 1)).astype(int) * 3 + \
((df.a == 0) & (df.b == 1)).astype(int) * 4 + \
((df.a != 0) & (df.b == 1)).astype(int) * 5
Run Code Online (Sandbox Code Playgroud)
其中 apply 方法需要 25 秒,上面的方法需要大约 18 毫秒。
您也可以使用该apply功能。例如:
def myfunc(gender, pet1, pet2):
if gender=='male' and pet1==pet2:
myvalue=5
elif gender=='female' and (pet1=='cat' or pet1=='dog'):
myvalue=5
else:
myvalue=0
return myvalue
Run Code Online (Sandbox Code Playgroud)
然后通过设置使用应用功能 axis=1
df['points'] = df.apply(lambda x: myfunc(x['gender'], x['pet1'], x['pet2']), axis=1)
Run Code Online (Sandbox Code Playgroud)
我们得到:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
21920 次 |
| 最近记录: |