所以我读了一个包含29列的数据表,并在一个索引列中添加(总共30个).
Data = pd.read_excel(os.path.join(BaseDir, 'test.xlsx'))
Data.reset_index(inplace=True)
Run Code Online (Sandbox Code Playgroud)
然后,我希望将数据子集化为仅包含列名称包含"ref"或"Ref"的列; 我从另一个Stack帖子获得了以下代码:
col_keep = Data.ix[:, pd.Series(Data.columns.values).str.contains('ref', case=False)]
Run Code Online (Sandbox Code Playgroud)
但是,我不断收到此错误:
print(len(Data.columns.values))
30
print(pd.Series(Data.columns.values).str.contains('ref', case=False))
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 True
25 True
26 True
27 True
28 False
29 False
dtype: bool
Traceback (most recent call last):
File "C:/Users/lala.py", line 26, in <module>
col_keep = FedexData.ix[:, pd.Series(FedexData.columns.values).str.contains('ref', case=False)]
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 84, in __getitem__
return self._getitem_tuple(key)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 816, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1014, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1041, in _getitem_iterable
key = check_bool_indexer(labels, key)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1817, in check_bool_indexer
raise IndexingError('Unalignable boolean Series key provided')
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided
Run Code Online (Sandbox Code Playgroud)
所以布尔值是正确的,但为什么它不起作用?为什么错误会不断出现?
任何帮助/提示表示赞赏!非常感谢你.
我可以通过这种方式重现类似的错误消息:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(4, size=(10,4)), columns=list('ABCD'))
df.ix[:, pd.Series([True,False,True,False])]
Run Code Online (Sandbox Code Playgroud)
加注(使用Pandas版本0.21.0.dev + 25.g50e95e0)
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
Run Code Online (Sandbox Code Playgroud)
出现此问题是因为Pandas在使用Series布尔值进行屏蔽之前尝试将Series的索引与DataFrame的列索引对齐.由于df具有列标签'A', 'B', 'C', 'D'和系列具有的索引标识0,1,2,3,大熊猫被抱怨的标签unalignable.
您可能不希望任何索引对齐.因此,传递NumPy布尔数组而不是Pandas系列:
mask = pd.Series(Data.columns.values).str.contains('ref', case=False).values
col_keep = Data.loc[:, mask]
Run Code Online (Sandbox Code Playgroud)
该Series.values属性返回NumPy数组.因为在未来版本的Pandas中,DataFrame.ix将被删除,Data.loc而不是在Data.ix这里使用,因为我们需要布尔索引.