如何将lambda函数正确应用到pandas数据框列中

Question

如何将lambda函数正确应用到pandas数据框列中

我有一个pandas数据框,sample其中一个被调用的列PR应用了lambda函数,如下所示:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)

Run Code Online (Sandbox Code Playgroud)

然后,我得到以下语法错误消息:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
                                                         ^
SyntaxError: invalid syntax

Run Code Online (Sandbox Code Playgroud)

我究竟做错了什么？

Answer 1

jez*_*ael 25

你需要mask:

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)

Run Code Online (Sandbox Code Playgroud)

另一种解决方案:loc和boolean indexing:

sample.loc[sample['PR'] < 90, 'PR'] = np.nan

Run Code Online (Sandbox Code Playgroud)

样品:

import pandas as pd
import numpy as np

sample = pd.DataFrame({'PR':[10,100,40] })
print (sample)
    PR
0   10
1  100
2   40

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
print (sample)
      PR
0    NaN
1  100.0
2    NaN

Run Code Online (Sandbox Code Playgroud)

sample.loc[sample['PR'] < 90, 'PR'] = np.nan
print (sample)
      PR
0    NaN
1  100.0
2    NaN

Run Code Online (Sandbox Code Playgroud)

编辑:

解决方案apply:

sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)

Run Code Online (Sandbox Code Playgroud)

时间 len(df)=300k:

sample = pd.concat([sample]*100000).reset_index(drop=True)

In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x)
10 loops, best of 3: 102 ms per loop

In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan)
The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3.71 ms per loop

Run Code Online (Sandbox Code Playgroud)

Answer 2

小智 7

您需要在lambda函数中添加else，因为您要告诉您在满足条件（此处x <90）的情况下该怎么做，但您没有告诉要在不满足条件的情况下该怎么办。

sample['PR'] = sample['PR'].apply(lambda x: 'NaN' if x < 90 else x)

归档时间：	9 年，9 月前
查看次数：	90072 次
最近记录：	6 年，11 月前