多列的Pandas Fillna与每列模式

Nic*_*ick 6 python numpy pandas data-science

使用人口普查数据,我想用两列("工作类"和"原生国家")中的NaN替换这两列的相应模式.我可以轻松获得模式:

mode = df.filter(["workclass", "native-country"]).mode()
Run Code Online (Sandbox Code Playgroud)

返回一个数据帧:

  workclass native-country
0   Private  United-States
Run Code Online (Sandbox Code Playgroud)

然而,

df.filter(["workclass", "native-country"]).fillna(mode)
Run Code Online (Sandbox Code Playgroud)

不能取代与任何每一列的NaN的,更何况是对应于列模式.有没有顺利的方法来做到这一点?

jez*_*ael 8

如果要归咎于与遗漏值mode在一些列的数据框df,你可以fillna通过Series按选择创建的位置由iloc:

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])
Run Code Online (Sandbox Code Playgroud)

要么:

df[cols]=df[cols].fillna(mode.iloc[0])
Run Code Online (Sandbox Code Playgroud)

你的解决方案

df[cols]=df.filter(cols).fillna(mode.iloc[0])
Run Code Online (Sandbox Code Playgroud)

样品:

df = pd.DataFrame({'workclass':['Private','Private',np.nan, 'another', np.nan],
                   'native-country':['United-States',np.nan,'Canada',np.nan,'United-States'],
                   'col':[2,3,7,8,9]})

print (df)
   col native-country workclass
0    2  United-States   Private
1    3            NaN   Private
2    7         Canada       NaN
3    8            NaN   another
4    9  United-States       NaN

mode = df.filter(["workclass", "native-country"]).mode()
print (mode)
  workclass native-country
0   Private  United-States

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])
print (df)
   col native-country workclass
0    2  United-States   Private
1    3  United-States   Private
2    7         Canada   Private
3    8  United-States   another
4    9  United-States   Private
Run Code Online (Sandbox Code Playgroud)