use*_*289 260 python numpy dataframe pandas
我有一个沿着下面的数据框:
Type Set
1 A Z
2 B Z
3 B X
4 C Y
Run Code Online (Sandbox Code Playgroud)
我想在数据帧中添加另一列(或生成一系列)与数据帧相同的长度(=相等的记录/行数),如果Set ='Z'则设置颜色为绿色,如果Set =否则设置为'red' .
最好的方法是什么?
unu*_*tbu 591
如果您只有两个选择可供选择:
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
Run Code Online (Sandbox Code Playgroud)
例如,
import pandas as pd
import numpy as np
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
print(df)
Run Code Online (Sandbox Code Playgroud)
产量
Set Type color
0 Z A green
1 Z B green
2 X B red
3 Y C red
Run Code Online (Sandbox Code Playgroud)
如果您有两个以上的条件,那么使用np.select
.例如,如果你想color
成为
yellow
什么时候 (df['Set'] == 'Z') & (df['Type'] == 'A')
blue
何时(df['Set'] == 'Z') & (df['Type'] == 'B')
purple
何时(df['Type'] == 'B')
black
,然后用
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
conditions = [
(df['Set'] == 'Z') & (df['Type'] == 'A'),
(df['Set'] == 'Z') & (df['Type'] == 'B'),
(df['Type'] == 'B')]
choices = ['yellow', 'blue', 'purple']
df['color'] = np.select(conditions, choices, default='black')
print(df)
Run Code Online (Sandbox Code Playgroud)
产量
Set Type color
0 Z A yellow
1 Z B blue
2 X B purple
3 Y C black
Run Code Online (Sandbox Code Playgroud)
che*_*ard 100
列表理解是另一种有条件地创建另一列的方法.如果您在列中使用对象dtypes,就像在您的示例中一样,列表推导通常优于大多数其他方法.
列表理解示例:
df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]
Run Code Online (Sandbox Code Playgroud)
%timeit测试:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
%timeit df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]
%timeit df['color'] = np.where(df['Set']=='Z', 'green', 'red')
%timeit df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')
1000 loops, best of 3: 239 µs per loop
1000 loops, best of 3: 523 µs per loop
1000 loops, best of 3: 263 µs per loop
Run Code Online (Sandbox Code Playgroud)
bla*_*ite 18
这是另一种为这只猫设置皮肤的方法,使用字典将新值映射到列表中的键:
def map_values(row, values_dict):
return values_dict[row]
values_dict = {'A': 1, 'B': 2, 'C': 3, 'D': 4}
df = pd.DataFrame({'INDICATOR': ['A', 'B', 'C', 'D'], 'VALUE': [10, 9, 8, 7]})
df['NEW_VALUE'] = df['INDICATOR'].apply(map_values, args = (values_dict,))
Run Code Online (Sandbox Code Playgroud)
它看起来像什么:
df
Out[2]:
INDICATOR VALUE NEW_VALUE
0 A 10 1
1 B 9 2
2 C 8 3
3 D 7 4
Run Code Online (Sandbox Code Playgroud)
当你有许多ifelse
类型的语句要做时(即要替换许多唯一值),这种方法可能非常强大.
当然,你总能做到这一点:
df['NEW_VALUE'] = df['INDICATOR'].map(values_dict)
Run Code Online (Sandbox Code Playgroud)
但是这种方法apply
在我的机器上比上面的方法慢三倍.
你也可以这样做,使用dict.get
:
df['NEW_VALUE'] = [values_dict.get(v, None) for v in df['INDICATOR']]
Run Code Online (Sandbox Code Playgroud)
ach*_*uva 16
另一种可以实现这一目标的方法是
df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')
Run Code Online (Sandbox Code Playgroud)
bli*_*bli 15
以下比此处计时方法慢,但我们可以根据多个列的内容计算额外列,并且可以为额外列计算两个以上的值.
仅使用"Set"列的简单示例:
def set_color(row):
if row["Set"] == "Z":
return "red"
else:
return "green"
df = df.assign(color=df.apply(set_color, axis=1))
print(df)
Run Code Online (Sandbox Code Playgroud)
Set Type color
0 Z A red
1 Z B red
2 X B green
3 Y C green
Run Code Online (Sandbox Code Playgroud)
考虑更多颜色和更多列的示例:
def set_color(row):
if row["Set"] == "Z":
return "red"
elif row["Type"] == "C":
return "blue"
else:
return "green"
df = df.assign(color=df.apply(set_color, axis=1))
print(df)
Run Code Online (Sandbox Code Playgroud)
Set Type color
0 Z A red
1 Z B red
2 X B green
3 Y C blue
Run Code Online (Sandbox Code Playgroud)
也许这是通过更新Pandas来实现的,但是到目前为止,我认为以下是该问题的最短和最佳答案。您可以根据需要使用一种或多种条件。
df=pd.DataFrame(dict(Type='A B B C'.split(), Set='Z Z X Y'.split()))
df['Color'] = "red"
df.loc[(df['Set']=="Z"), 'Color'] = "green"
print(df)
# result:
Type Set Color
0 A Z green
1 B Z green
2 B X red
3 C Y red
Run Code Online (Sandbox Code Playgroud)
df[\'color\'] = \'green\'\ndf[\'color\'] = df[\'color\'].where(df[\'Set\']==\'Z\', other=\'red\')\n# Replace values where the condition is False\n
Run Code Online (Sandbox Code Playgroud)\n或者
\ndf[\'color\'] = \'red\'\ndf[\'color\'] = df[\'color\'].mask(df[\'Set\']==\'Z\', other=\'green\')\n# Replace values where the condition is True\n
Run Code Online (Sandbox Code Playgroud)\n或者,您可以将该方法transform
与 lambda 函数一起使用:
df[\'color\'] = df[\'Set\'].transform(lambda x: \'green\' if x == \'Z\' else \'red\')\n
Run Code Online (Sandbox Code Playgroud)\n输出:
\n Type Set color\n1 A Z green\n2 B Z green\n3 B X red\n4 C Y red\n
Run Code Online (Sandbox Code Playgroud)\n@chai 的性能比较:
\nimport pandas as pd\nimport numpy as np\ndf = pd.DataFrame({\'Type\':list(\'ABBC\')*1000000, \'Set\':list(\'ZZXY\')*1000000})\n \n%timeit df[\'color1\'] = \'red\'; df[\'color1\'].where(df[\'Set\']==\'Z\',\'green\')\n%timeit df[\'color2\'] = [\'red\' if x == \'Z\' else \'green\' for x in df[\'Set\']]\n%timeit df[\'color3\'] = np.where(df[\'Set\']==\'Z\', \'red\', \'green\')\n%timeit df[\'color4\'] = df.Set.map(lambda x: \'red\' if x == \'Z\' else \'green\')\n\n397 ms \xc2\xb1 101 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n976 ms \xc2\xb1 241 ms per loop\n673 ms \xc2\xb1 139 ms per loop\n796 ms \xc2\xb1 182 ms per loop\n
Run Code Online (Sandbox Code Playgroud)\n
归档时间: |
|
查看次数: |
330361 次 |
最近记录: |