Naz*_*uri 4 python lambda replace dataframe pandas
我有多个简单的函数需要在我的数据帧的某些列的每一行上实现.数据帧很像,1000万+行.我的数据框是这样的:
Date location city number value
12/3/2018 NY New York 2 500
12/1/2018 MN Minneapolis 3 600
12/2/2018 NY Rochester 1 800
12/3/2018 WA Seattle 2 400
Run Code Online (Sandbox Code Playgroud)
我有这样的功能:
def normalized_location(row):
if row['city'] == " Minneapolis":
return "FCM"
elif row['city'] == "Seattle":
return "FCS"
else:
return "Other"
Run Code Online (Sandbox Code Playgroud)
然后我用:
df['Normalized Location'] =df.apply (lambda row: normalized_location (row),axis=1)
Run Code Online (Sandbox Code Playgroud)
这非常慢,我怎样才能提高效率呢?
我们可以利用这个速度极快的map一个defaultdict.
from collections import defaultdict
d = defaultdict(lambda: 'Other')
d.update({"Minneapolis": "FCM", "Seattle": "FCS"})
df['normalized_location'] = df['city'].map(d)
print(df)
Date location city number value normalized_location
0 12/3/2018 NY New York 2 500 Other
1 12/1/2018 MN Minneapolis 3 600 FCM
2 12/2/2018 NY Rochester 1 800 Other
3 12/3/2018 WA Seattle 2 400 FCS
Run Code Online (Sandbox Code Playgroud)
...... fillna出于性能原因,绕过一个电话.这种方法很容易推广到多个替换.
您可能想要使用np.select:
conds = [df.city == 'Minneapolis', df.city == 'Seattle']
choices = ['FCM', 'FCS']
df['normalized_location'] = np.select(conds, choices, default='other')
>>> df
Date location city number value normalized_location
0 12/3/2018 NY New York 2 500 other
1 12/1/2018 MN Minneapolis 3 600 FCM
2 12/2/2018 NY Rochester 1 800 other
3 12/3/2018 WA Seattle 2 400 FCS
Run Code Online (Sandbox Code Playgroud)