在Pandas中,如果列最初是空的,如何使用fillna用字符串填充整个列?

wai*_*kuo 10 python pandas

我的桌子:

In [15]: csv=u"""a,a,,a
   ....: b,b,,b
   ....: c,c,,c
   ....: """

In [18]: df = pd.read_csv(io.StringIO(csv), header=None)
Run Code Online (Sandbox Code Playgroud)

将空列填入"UNKNOWN"

In [19]: df
Out[19]: 
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c

In [20]: df.fillna({2:'UNKNOWN'})
Run Code Online (Sandbox Code Playgroud)

得到了错误

ValueError: could not convert string to float: UNKNOWN
Run Code Online (Sandbox Code Playgroud)

DSM*_*DSM 9

您的2列可能有一个浮点数dtype:

>>> df
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c
>>> df.dtypes
0     object
1     object
2    float64
3     object
dtype: object
Run Code Online (Sandbox Code Playgroud)

因此问题.如果您不介意将整个帧转换为object,您可以:

>>> df.astype(object).fillna("UNKNOWN")
   0  1        2  3
0  a  a  UNKNOWN  a
1  b  b  UNKNOWN  b
2  c  c  UNKNOWN  c
Run Code Online (Sandbox Code Playgroud)

根据是否存在非字符串数据,您可能希望更有选择性地转换列dtypes,和/或在读取时指定dtypes,但无论如何,上述应该可以正常工作.


更新:如果你想要保留dtype信息,而不是将其切换回来,我会采用另一种方式,只填写你想要的列,或者使用一个循环fillna:

>>> df
   0  1  2   3  4   5
0  0  a  a NaN  a NaN
1  1  b  b NaN  b NaN
2  2  c  c NaN  c NaN
>>> df.dtypes
0      int64
1     object
2     object
3    float64
4     object
5    float64
dtype: object
>>> for col in df.columns[pd.isnull(df).all()]:
...         df[col] = df[col].astype(object).fillna("UNKNOWN")
...     
>>> df
   0  1  2        3  4        5
0  0  a  a  UNKNOWN  a  UNKNOWN
1  1  b  b  UNKNOWN  b  UNKNOWN
2  2  c  c  UNKNOWN  c  UNKNOWN
>>> df.dtypes
0     int64
1    object
2    object
3    object
4    object
5    object
dtype: object
Run Code Online (Sandbox Code Playgroud)

或者(如果你正在使用all),那么甚至可能根本不使用fillna:

>>> df
   0  1  2   3  4   5
0  0  a  a NaN  a NaN
1  1  b  b NaN  b NaN
2  2  c  c NaN  c NaN
>>> df.ix[:,pd.isnull(df).all()] = "UNKNOWN"
>>> df
   0  1  2        3  4        5
0  0  a  a  UNKNOWN  a  UNKNOWN
1  1  b  b  UNKNOWN  b  UNKNOWN
2  2  c  c  UNKNOWN  c  UNKNOWN
Run Code Online (Sandbox Code Playgroud)