在 Lambda Python 中使用两个变量

Question

在 Lambda Python 中使用两个变量

我想根据两个变量创建一个新列。如果（第 1 列 >= .5 或第 2 列 < 0.5）和（第 1 列 < .5 或第 2 列 >= 0.5），我希望我的新列的值为“好”，否则为“坏”。

我尝试使用lambda和if。

df["new column"] = df[["column 1", "column 2"]].apply(
    lambda x, y: "Good" if (x >= 0.5 or y < 0.5) and (x < 0.5 or y >= 0.5) else "Bad"
)

Run Code Online (Sandbox Code Playgroud)

得到了

TypeError: ("() missing 1 required positional argument: 'y'", 'occurred at index column 1')

Run Code Online (Sandbox Code Playgroud)

Answer 1

Sco*_*ton 5

使用np.where，pandas 做内在数据对齐，这意味着你不需要使用 apply 或逐行迭代，pandas 将对齐索引上的数据：

df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df

Run Code Online (Sandbox Code Playgroud)

使用@YunaA。设置....

import pandas as pd

df = pd.DataFrame({'x': [1, 2, 0.1, 0.1], 
                   'y': [1, 2, 0.7, 0.2], 
                   'column 3': [1, 2, 3, 4]})

df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df

Run Code Online (Sandbox Code Playgroud)

输出：

     x    y  column 3 new column
0  1.0  1.0         1       Good
1  2.0  2.0         2       Good
2  0.1  0.7         3        Bad
3  0.1  0.2         4       Good

Run Code Online (Sandbox Code Playgroud)

时间：

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'x':np.random.random(100)*2, 
                   'y': np.random.random(100)*1})
def update_column(row):
    if (row['x'] >= .5 or row['y'] <= .5) and (row['x'] < .5 or row['y'] >= .5):
        return "Good"
    return "Bad"

Run Code Online (Sandbox Code Playgroud)

结果

%timeit df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5))
& ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')

Run Code Online (Sandbox Code Playgroud)

每个循环 1.45 ms ± 72.9 µs（7 次运行的平均值 ± 标准偏差，每次 1000 次循环）

%timeit df['new_column'] = df.apply(update_column, axis=1)

Run Code Online (Sandbox Code Playgroud)

每个循环 5.83 ms ± 484 µs（7 次运行的平均值 ± 标准偏差，每次 100 次循环）

归档时间：	5 年，12 月前
查看次数：	198 次
最近记录：	5 年，12 月前