Seu*_*ung 1 python lambda pandas
我想根据两个变量创建一个新列。如果(第 1 列 >= .5 或第 2 列 < 0.5)和(第 1 列 < .5 或第 2 列 >= 0.5),我希望我的新列的值为“好”,否则为“坏”。
我尝试使用lambda和if。
df["new column"] = df[["column 1", "column 2"]].apply(
lambda x, y: "Good" if (x >= 0.5 or y < 0.5) and (x < 0.5 or y >= 0.5) else "Bad"
)
Run Code Online (Sandbox Code Playgroud)
得到了
TypeError: ("() missing 1 required positional argument: 'y'", 'occurred at index column 1')
Run Code Online (Sandbox Code Playgroud)
使用np.where,pandas 做内在数据对齐,这意味着你不需要使用 apply 或逐行迭代,pandas 将对齐索引上的数据:
df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df
Run Code Online (Sandbox Code Playgroud)
使用@YunaA。设置....
import pandas as pd
df = pd.DataFrame({'x': [1, 2, 0.1, 0.1],
'y': [1, 2, 0.7, 0.2],
'column 3': [1, 2, 3, 4]})
df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df
Run Code Online (Sandbox Code Playgroud)
输出:
x y column 3 new column
0 1.0 1.0 1 Good
1 2.0 2.0 2 Good
2 0.1 0.7 3 Bad
3 0.1 0.2 4 Good
Run Code Online (Sandbox Code Playgroud)
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({'x':np.random.random(100)*2,
'y': np.random.random(100)*1})
def update_column(row):
if (row['x'] >= .5 or row['y'] <= .5) and (row['x'] < .5 or row['y'] >= .5):
return "Good"
return "Bad"
Run Code Online (Sandbox Code Playgroud)
结果
%timeit df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5))
& ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
Run Code Online (Sandbox Code Playgroud)
每个循环 1.45 ms ± 72.9 µs(7 次运行的平均值 ± 标准偏差,每次 1000 次循环)
%timeit df['new_column'] = df.apply(update_column, axis=1)
Run Code Online (Sandbox Code Playgroud)
每个循环 5.83 ms ± 484 µs(7 次运行的平均值 ± 标准偏差,每次 100 次循环)