Pandas:IndexingError:作为索引器提供的Unalignable boolean Series

elP*_*tor 15 python pandas

我正在尝试运行我认为简单的代码以消除所有NaN的列,但无法使其工作(axis = 1在消除行时工作得很好):

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,np.nan,np.nan], 'b':[4,np.nan,6,np.nan], 'c':[np.nan, 8,9,np.nan], 'd':[np.nan,np.nan,np.nan,np.nan]})

df = df[df.notnull().any(axis = 0)]

print df
Run Code Online (Sandbox Code Playgroud)

完整错误:

raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

预期产量:

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 16

你需要loc,因为按列过滤:

print (df.notnull().any(axis = 0))
a     True
b     True
c     True
d    False
dtype: bool

df = df.loc[:, df.notnull().any(axis = 0)]
print (df)

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN
Run Code Online (Sandbox Code Playgroud)

或过滤列,然后选择[]:

print (df.columns[df.notnull().any(axis = 0)])
Index(['a', 'b', 'c'], dtype='object')

df = df[df.columns[df.notnull().any(axis = 0)]]
print (df)

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN
Run Code Online (Sandbox Code Playgroud)

或者dropna使用参数how='all'删除NaN仅由s 填充的所有列:

print (df.dropna(axis=1, how='all'))
     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN
Run Code Online (Sandbox Code Playgroud)

  • 啊哈,因为`df[]` 方法正在寻找基于行的索引,而不是基于列的索引。收到了。谢谢。 (2认同)
  • @pshep123 - 很高兴能提供帮助! (2认同)
  • 这是违反直觉的,因为索引数据帧的最简单形式是关联的,即选择具有列标题之一的列:`df ['headername']` (2认同)

EdC*_*ica 6

dropna您可以与axis=1和 一起使用thresh=1

In[19]:
df.dropna(axis=1, thresh=1)

Out[19]: 
     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN
Run Code Online (Sandbox Code Playgroud)

这将删除任何没有至少 1 个非 NaN 值的列,这意味着任何包含所有值的列都NaN将被删除

您尝试失败的原因是因为布尔掩码:

In[20]:
df.notnull().any(axis = 0)

Out[20]: 
a     True
b     True
c     True
d    False
dtype: bool
Run Code Online (Sandbox Code Playgroud)

无法在默认使用的索引上对齐,因为这会在列上生成布尔掩码

  • 谢谢埃德 - 我不知道“thresh”参数。刚刚了解到您可以同时使用两个轴来修剪所有空行和列:`df = df.dropna(axis = [0,1], how = 'all')` (2认同)