如何在单个 np.where 条件中使用多个值?

SSM*_*SMK 3 python numpy series dataframe pandas

我有一个如下所示的数据框

df = pd.DataFrame({'text': ["Hi how","I am fine","Ila say Hi","hello"],
                   'tokens':["test","correct","Tim",np.nan],
                   'labels':['A','B','C','D']})
Run Code Online (Sandbox Code Playgroud)

而不是多个 np.where 条件,我想使用Oror|运算符来检查条件中的多个值,np.where如下所示

df['labels'] = np.where(df['tokens'] == ('test'|'correct'|is.na()),'new_label',df['labels'])
Run Code Online (Sandbox Code Playgroud)

但是,这会导致错误

类型错误:不支持 | 的操作数类型:'str' 和 'str'

我希望我的输出如下所示。对于具有数百万条记录的大数据,我如何有效地做到这一点?

在此处输入图片说明

jez*_*ael 5

第一个想法是用列表中的某个值替换缺失值,例如test,然后比较Series.isin

df['labels'] = np.where(df['tokens'].fillna('test').isin(['test','correct']),
                        'new_label',
                        df['labels'])
print (df)
         text   tokens     labels
0      Hi how     test  new_label
1   I am fine  correct  new_label
2  Ila say Hi      Tim          C
3       hello      NaN  new_label
Run Code Online (Sandbox Code Playgroud)

或者通过|按位OR形式 compare链接另一个掩码NaN

df['labels'] = np.where(df['tokens'].isin(['test','correct']) | df['tokens'].isna(),
                        'new_label',
                        df['labels'])
print (df)
         text   tokens     labels
0      Hi how     test  new_label
1   I am fine  correct  new_label
2  Ila say Hi      Tim          C
3       hello      NaN  new_label
Run Code Online (Sandbox Code Playgroud)