pandas意外的set_index行为

pip*_*ppo 2 python pandas

data = [['g1','a',1],['g1','b',2],['g2','b',3],['g2','a',4]]
df = pandas.DataFrame(data=data, columns=['group','name','count'])
print df.set_index(['group','name'])
print df.set_index(['name','group'])

            count
group name       
g1    a         1
      b         2
g2    b         3
      a         4
            count
name group       
a    g1         1
b    g1         2
     g2         3
a    g2         4
Run Code Online (Sandbox Code Playgroud)

这种行为对我来说相当令人惊讶,因为我期待第二个输出就像

            count
name group       
a    g1         1
     g2         4
b    g1         2
     g2         3
Run Code Online (Sandbox Code Playgroud)

chr*_*ock 7

需要首先对DataFrame进行排序以获得所需的输出:

In [12]: df.sort_values('name').set_index(['name','group'])
Out[12]: 
            count
name group       
a    g1         1
     g2         4
b    g1         2
     g2         3
Run Code Online (Sandbox Code Playgroud)

  • 确实[docs](http://pandas.pydata.org/pandas-docs/stable/advanced.html#the-need-for-sortedness-with-multiindex)讨论了为什么这很重要 (2认同)