Python pandas.core.indexing.IndexingError:提供了Unalignable boolean Series键

alw*_*ons 2 python pandas

所以我读了一个包含29列的数据表,并在一个索引列中添加(总共30个).

Data = pd.read_excel(os.path.join(BaseDir, 'test.xlsx'))
Data.reset_index(inplace=True)
Run Code Online (Sandbox Code Playgroud)

然后,我希望将数据子集化为仅包含列名称包含"ref"或"Ref"的列; 我从另一个Stack帖子获得了以下代码:

col_keep = Data.ix[:, pd.Series(Data.columns.values).str.contains('ref', case=False)]
Run Code Online (Sandbox Code Playgroud)

但是,我不断收到此错误:

    print(len(Data.columns.values))
    30
    print(pd.Series(Data.columns.values).str.contains('ref', case=False))
    0     False
    1     False
    2     False
    3     False
    4     False
    5     False
    6     False
    7     False
    8     False
    9     False
    10    False
    11    False
    12    False
    13    False
    14    False
    15    False
    16    False
    17    False
    18    False
    19    False
    20    False
    21    False
    22    False
    23    False
    24     True
    25     True
    26     True
    27     True
    28    False
    29    False
    dtype: bool

Traceback (most recent call last):
  File "C:/Users/lala.py", line 26, in <module>
    col_keep = FedexData.ix[:, pd.Series(FedexData.columns.values).str.contains('ref', case=False)]
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 84, in __getitem__
    return self._getitem_tuple(key)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 816, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1014, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1041, in _getitem_iterable
    key = check_bool_indexer(labels, key)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1817, in check_bool_indexer
    raise IndexingError('Unalignable boolean Series key provided')
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided
Run Code Online (Sandbox Code Playgroud)

所以布尔值是正确的,但为什么它不起作用?为什么错误会不断出现?

任何帮助/提示表示赞赏!非常感谢你.

unu*_*tbu 6

我可以通过这种方式重现类似的错误消息:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(4, size=(10,4)), columns=list('ABCD'))
df.ix[:, pd.Series([True,False,True,False])]
Run Code Online (Sandbox Code Playgroud)

加注(使用Pandas版本0.21.0.dev + 25.g50e95e0)

pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
Run Code Online (Sandbox Code Playgroud)

出现此问题是因为Pandas在使用Series布尔值进行屏蔽之前尝试将Series的索引与DataFrame的列索引对齐.由于df具有列标签'A', 'B', 'C', 'D'和系列具有的索引标识0,1,2,3,大熊猫被抱怨的标签unalignable.

您可能不希望任何索引对齐.因此,传递NumPy布尔数组而不是Pandas系列:

mask = pd.Series(Data.columns.values).str.contains('ref', case=False).values
col_keep = Data.loc[:, mask]
Run Code Online (Sandbox Code Playgroud)

Series.values属性返回NumPy数组.因为在未来版本的Pandas中,DataFrame.ix将被删除,Data.loc而不是在Data.ix这里使用,因为我们需要布尔索引.