我有一个pandas数据框,sample其中一个被调用的列PR应用了lambda函数,如下所示:
sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
Run Code Online (Sandbox Code Playgroud)
然后,我得到以下语法错误消息:
sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
^
SyntaxError: invalid syntax
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么?
jez*_*ael 25
你需要mask:
sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
Run Code Online (Sandbox Code Playgroud)
另一种解决方案:loc和boolean indexing:
sample.loc[sample['PR'] < 90, 'PR'] = np.nan
Run Code Online (Sandbox Code Playgroud)
样品:
import pandas as pd
import numpy as np
sample = pd.DataFrame({'PR':[10,100,40] })
print (sample)
PR
0 10
1 100
2 40
sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
print (sample)
PR
0 NaN
1 100.0
2 NaN
Run Code Online (Sandbox Code Playgroud)
sample.loc[sample['PR'] < 90, 'PR'] = np.nan
print (sample)
PR
0 NaN
1 100.0
2 NaN
Run Code Online (Sandbox Code Playgroud)
编辑:
解决方案apply:
sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)
Run Code Online (Sandbox Code Playgroud)
时间 len(df)=300k:
sample = pd.concat([sample]*100000).reset_index(drop=True)
In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x)
10 loops, best of 3: 102 ms per loop
In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan)
The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3.71 ms per loop
Run Code Online (Sandbox Code Playgroud)
小智 7
您需要在lambda函数中添加else,因为您要告诉您在满足条件(此处x <90)的情况下该怎么做,但您没有告诉要在不满足条件的情况下该怎么办。
sample['PR'] = sample['PR'].apply(lambda x: 'NaN' if x < 90 else x)
| 归档时间: |
|
| 查看次数: |
90072 次 |
| 最近记录: |