tkt*_*711 8 dataframe pandas python-3.5
我有一个DataFrame:df如下:
row id name age url
1 e1 tom NaN http1
2 e2 john 25 NaN
3 e3 lucy NaN http3
4 e4 tick 29 NaN
Run Code Online (Sandbox Code Playgroud)
我想将NaN更改为0,否则在列中更改为1:age,url.我的代码如下,但这是错误的.
import Pandas as pd
df[['age', 'url']].applymap(lambda x: 0 if x=='NaN' else x)
Run Code Online (Sandbox Code Playgroud)
我想得到以下结果:
row id name age url
1 e1 tom 0 1
2 e2 john 1 0
3 e3 lucy 0 1
4 e4 tick 1 0
Run Code Online (Sandbox Code Playgroud)
谢谢你的帮助!
您可以使用wherewith fillna和condition by isnull:
df[['age', 'url']] = df[['age', 'url']].where(df[['age', 'url']].isnull(), 1)
.fillna(0).astype(int)
print (df)
row id name age url
0 1 e1 tom 0 1
1 2 e2 john 1 0
2 3 e3 lucy 0 1
3 4 e4 tick 1 0
Run Code Online (Sandbox Code Playgroud)
df[['age', 'url']] = np.where(df[['age', 'url']].isnull(), 0, 1)
print (df)
row id name age url
0 1 e1 tom 0 1
1 2 e2 john 1 0
2 3 e3 lucy 0 1
3 4 e4 tick 1 0
Run Code Online (Sandbox Code Playgroud)
df[['age', 'url']] = df[['age', 'url']].notnull().astype(int)
print (df)
row id name age url
0 1 e1 tom 0 1
1 2 e2 john 1 0
2 3 e3 lucy 0 1
3 4 e4 tick 1 0
Run Code Online (Sandbox Code Playgroud)
编辑:
我尝试修改你的解决方案:
df[['age', 'url']] = df[['age', 'url']].applymap(lambda x: 0 if pd.isnull(x) else 1)
print (df)
row id name age url
0 1 e1 tom 0 1
1 2 e2 john 1 0
2 3 e3 lucy 0 1
3 4 e4 tick 1 0
Run Code Online (Sandbox Code Playgroud)
时间:
len(df)=4k:
In [127]: %timeit df[['age', 'url']] = df[['age', 'url']].applymap(lambda x: 0 if pd.isnull(x) else 1)
100 loops, best of 3: 11.2 ms per loop
In [128]: %timeit df[['age', 'url']] = np.where(df[['age', 'url']].isnull(), 0, 1)
100 loops, best of 3: 2.69 ms per loop
In [129]: %timeit df[['age', 'url']] = np.where(pd.notnull(df[['age', 'url']]), 1, 0)
100 loops, best of 3: 2.78 ms per loop
In [131]: %timeit df.loc[:, ['age', 'url']] = df[['age', 'url']].notnull() * 1
1000 loops, best of 3: 1.45 ms per loop
In [136]: %timeit df[['age', 'url']] = df[['age', 'url']].notnull().astype(int)
1000 loops, best of 3: 1.01 ms per loop
Run Code Online (Sandbox Code Playgroud)
使用np.where具有pd.notnull与替换丢失和有效元素0,并1分别为:
In [90]:
df[['age', 'url']] = np.where(pd.notnull(df[['age', 'url']]), 1, 0)
df
Out[90]:
row id name age url
0 1 e1 tom 0 1
1 2 e2 john 1 0
2 3 e3 lucy 0 1
3 4 e4 tick 1 0
Run Code Online (Sandbox Code Playgroud)