MultiIndex Slicing要求索引完全被lexsorted

Question

MultiIndex Slicing要求索引完全被lexsorted

我有索引(一个数据帧year,foo),其中,我想选择的X最大观测foo其中year == someYear.

我的方法是

df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True)
df.loc[pd.IndexSlice[2002, :10], :]

Run Code Online (Sandbox Code Playgroud)

但我明白了

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

Run Code Online (Sandbox Code Playgroud)

我尝试了不同的排序变体(例如ascending = [0, 0]),但它们都导致了某种错误.

如果我只想要这xth行,我可以df.groupby(level=[0]).nth(x)在排序之后,但由于我想要一组行,所以效率不高.

选择这些行的最佳方法是什么？一些数据:

                   rank_int  rank
year foo                         
2015 1.381845             2   320
     1.234795             2   259
     1.148488           199     2
     0.866704             2   363
     0.738022             2   319

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 11

首先你应该像这样排序:

df.sort_index(level=['year','foo'], ascending=[1, 0], inplace=True)

Run Code Online (Sandbox Code Playgroud)

它应该修复KeyError.但是df.loc[pd.IndexSlice[2002, :10], :]不会给你你期望的结果.loc函数不是iloc,它会尝试在foo索引0,1..9中查找.Multiindex的次要级别不支持iloc,我建议使用groupby.如果你已经拥有这个多索引,你应该这样做:

df.reset_index()
df = df.sort_values(by=['year','foo'],ascending=[True,False])
df.groupby('year').head(10)

Run Code Online (Sandbox Code Playgroud)

如果你需要n个条目,你可以使用最少的foo tail(n).如果您需要,例如,第一,第三和第五个条目,您可以使用nth([0,2,4])问题中提到的.我认为这是最有效的方式.

Answer 2

ASG*_*SGM 6

ascending 应该是一个布尔值，而不是一个列表。尝试这样排序：

df.sort_index(ascending=True, inplace=True)

归档时间：	9 年，4 月前
查看次数：	7523 次
最近记录：	8 年，7 月前