将pandas列中的值替换为缺少键的默认值

Naz*_*uri 4 python lambda replace dataframe pandas

我有多个简单的函数需要在我的数据帧的某些列的每一行上实现.数据帧很像,1000万+行.我的数据框是这样的:

Date      location   city        number  value
12/3/2018   NY       New York      2      500
12/1/2018   MN       Minneapolis   3      600
12/2/2018   NY       Rochester     1      800
12/3/2018   WA       Seattle       2      400
Run Code Online (Sandbox Code Playgroud)

我有这样的功能:

def normalized_location(row):
    if row['city'] == " Minneapolis":
        return "FCM"
    elif row['city'] == "Seattle":
        return "FCS"
    else:
        return "Other"
Run Code Online (Sandbox Code Playgroud)

然后我用:

df['Normalized Location'] =df.apply (lambda row: normalized_location (row),axis=1)
Run Code Online (Sandbox Code Playgroud)

这非常慢,我怎样才能提高效率呢?

cs9*_*s95 7

我们可以利用这个速度极快的map一个defaultdict.

from collections import defaultdict

d = defaultdict(lambda: 'Other')
d.update({"Minneapolis": "FCM", "Seattle": "FCS"})

df['normalized_location'] = df['city'].map(d)

print(df)
        Date location         city  number  value normalized_location
0  12/3/2018       NY     New York       2    500               Other
1  12/1/2018       MN  Minneapolis       3    600                 FCM
2  12/2/2018       NY    Rochester       1    800               Other
3  12/3/2018       WA      Seattle       2    400                 FCS
Run Code Online (Sandbox Code Playgroud)

...... fillna出于性能原因,绕过一个电话.这种方法很容易推广到多个替换.


sac*_*cuL 5

您可能想要使用np.select:

conds = [df.city == 'Minneapolis', df.city == 'Seattle']
choices = ['FCM', 'FCS']

df['normalized_location'] = np.select(conds, choices, default='other')

>>> df
        Date location         city  number  value normalized_location
0  12/3/2018       NY     New York       2    500               other
1  12/1/2018       MN  Minneapolis       3    600                 FCM
2  12/2/2018       NY    Rochester       1    800               other
3  12/3/2018       WA      Seattle       2    400                 FCS
Run Code Online (Sandbox Code Playgroud)

  • 只需一个小注释,每次更换,您将需要计算一个单独的掩码. (2认同)