我有一个像这样的MultiIndexed DataFrame:
In [2]: ix = pd.MultiIndex.from_product([[1, 2, 3], ['foo', 'bar'], ['baz', 'can']], names=['a', 'b', 'c'])
In [3]: data = np.arange(len(ix))
In [4]: df = pd.DataFrame(data, index=ix, columns=['hi'])
In [43]: df = df[~df.hi.isin([2, 3])]
In [44]: df
Out[44]:
hi
a b c
1 foo baz 0
can 1
2 foo baz 4
can 5
bar baz 6
can 7
3 foo baz 8
can 9
bar baz 10
can 11
Run Code Online (Sandbox Code Playgroud)
我想知道DataFrame中哪些级别a和b出现的对:
[(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')]
Run Code Online (Sandbox Code Playgroud)
我可以使用pd.unique,df.index.get_level_values但它似乎有点垃圾:
In [66]: pd.unique(zip(df.index.get_level_values(0), df.index.get_level_values(1)))
Out[66]: array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)
Run Code Online (Sandbox Code Playgroud)
有一种"好"的方式吗?
您可以调用drop_level您的多索引,然后unique获取您想要的列表:
In [126]:
df.index.droplevel('c').unique()
Out[126]:
array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)
Run Code Online (Sandbox Code Playgroud)
In [22]: df.reset_index().set_index(['a','b']).index.unique()
Out[22]: array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)
Run Code Online (Sandbox Code Playgroud)