Wil*_*iam 3 python numpy dataframe pandas
我有一个 df:
dfs = """
contract Valindex0 RB Valindex1
2 A00118 51 0 50
3 A00118 42 1 47
4 A00118 44 1 47
"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s+')
Run Code Online (Sandbox Code Playgroud)
df:
contract Valindex0 RB Valindex1
2 A00118 51 0 50
3 A00118 42 1 47
4 A00118 44 1 47
Run Code Online (Sandbox Code Playgroud)
我想为每一行 df['Valindex'] 添加一个新列,
此列值是
df['Valindex0']
Run Code Online (Sandbox Code Playgroud)
或者
df['Valindex1']
Run Code Online (Sandbox Code Playgroud)
这取决于 df['RB']:
if df['RB']==0:
df['Valindex'] = df['Valindex0']
elif df['RB']==1:
df['Valindex'] = df['Valindex1']
Run Code Online (Sandbox Code Playgroud)
现在我正在使用 apply lambda,但它很慢:
df['Valindex'] = df.apply(
lambda df: df["Valindex" + str(df["RB"])], axis=1)
Run Code Online (Sandbox Code Playgroud)
输出应如下所示:
contract Valindex0 RB Valindex1 Valindex
2 A00118 51 0 50 51
3 A00118 42 1 47 47
4 A00118 44 1 47 47
Run Code Online (Sandbox Code Playgroud)
有什么更快的方法吗?
使用np.where():
df["Valindex"] = np.where(df["RB"].eq(0), df["Valindex0"], df["Valindex1"])
Run Code Online (Sandbox Code Playgroud)
或者
使用np.select()多个案例和条件:
conditions = [df["RB"].eq(0), df["RB"].eq(1)]
labels = [df["Valindex0"], df["Valindex1"]]
df["Valindex"] = np.select(conditions, labels)
Run Code Online (Sandbox Code Playgroud)
的输出df:
contract Valindex0 RB Valindex1 Valindex
2 A00118 51 0 50 51
3 A00118 42 1 47 47
4 A00118 44 1 47 47
Run Code Online (Sandbox Code Playgroud)