pandas 比应用 lambda 更快的方式在每一行应用逻辑?

Wil*_*iam 3 python numpy dataframe pandas

我有一个 df:

dfs = """
    contract Valindex0  RB  Valindex1
2   A00118  51  0   50
3   A00118  42  1   47
4   A00118  44  1   47

"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s+')
Run Code Online (Sandbox Code Playgroud)

df:

  contract  Valindex0  RB  Valindex1
2   A00118         51   0         50
3   A00118         42   1         47
4   A00118         44   1         47
Run Code Online (Sandbox Code Playgroud)

我想为每一行 df['Valindex'] 添加一个新列,

此列值是

 df['Valindex0']
Run Code Online (Sandbox Code Playgroud)

或者

 df['Valindex1']
Run Code Online (Sandbox Code Playgroud)

这取决于 df['RB']:

if df['RB']==0:
   df['Valindex'] = df['Valindex0']
elif df['RB']==1:
  df['Valindex'] = df['Valindex1']  
Run Code Online (Sandbox Code Playgroud)

现在我正在使用 apply lambda,但它很慢:

df['Valindex'] = df.apply(
    lambda df: df["Valindex" + str(df["RB"])], axis=1)
Run Code Online (Sandbox Code Playgroud)

输出应如下所示:

    contract    Valindex0   RB  Valindex1   Valindex
2   A00118            51    0   50          51
3   A00118            42    1   47          47
4   A00118            44    1   47          47
Run Code Online (Sandbox Code Playgroud)

有什么更快的方法吗?

Anu*_*bas 7

使用np.where()

df["Valindex"] = np.where(df["RB"].eq(0), df["Valindex0"], df["Valindex1"])
Run Code Online (Sandbox Code Playgroud)

或者

使用np.select()多个案例和条件:

conditions = [df["RB"].eq(0), df["RB"].eq(1)]
labels = [df["Valindex0"], df["Valindex1"]]
df["Valindex"] = np.select(conditions, labels)
Run Code Online (Sandbox Code Playgroud)

的输出df

    contract    Valindex0   RB  Valindex1   Valindex
2   A00118      51          0   50          51
3   A00118      42          1   47          47
4   A00118      44          1   47          47
Run Code Online (Sandbox Code Playgroud)