Roc*_*etq 1 python nan python-2.7 pandas
我有以df原始形状命名的数据框(4361, 15)。某些agefm列的值为NaN。只是看看:
> df[df.agefm.isnull() == True].agefm.shape
(2282,)
Run Code Online (Sandbox Code Playgroud)
然后创建新列并将其所有值设置为0:
df['nevermarr'] = 0
Run Code Online (Sandbox Code Playgroud)
所以我想将nevermarrvalue 设置为1,然后在那一行agefm是Nan:
df[df.agefm.isnull() == True].nevermarr = 1
Run Code Online (Sandbox Code Playgroud)
没有改变:
> df['nevermarr'].sum()
0
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么?
最好是使用numpy.where:
df['nevermarr'] = np.where(df.agefm.isnull(), 1, 0)
print (df)
agefm nevermarr
0 NaN 1
1 5.0 0
2 6.0 0
Run Code Online (Sandbox Code Playgroud)
或使用loc,==True可以省略:
df.loc[df.agefm.isnull(), 'nevermarr'] = 1
Run Code Online (Sandbox Code Playgroud)
或mask:
df['nevermarr'] = df.nevermarr.mask(df.agefm.isnull(), 1)
print (df)
agefm nevermarr
0 NaN 1
1 5.0 2
2 6.0 3
Run Code Online (Sandbox Code Playgroud)
样品:
import pandas as pd
import numpy as np
df = pd.DataFrame({'nevermarr':[7,2,3],
'agefm':[np.nan,5,6]})
print (df)
agefm nevermarr
0 NaN 7
1 5.0 2
2 6.0 3
df.loc[df.agefm.isnull(), 'nevermarr'] = 1
print (df)
agefm nevermarr
0 NaN 1
1 5.0 2
2 6.0 3
Run Code Online (Sandbox Code Playgroud)