Pandas,计算每个MultiIndex子级的总和

Question

Pandas,计算每个MultiIndex子级的总和

我想计算每个多指数子级的总和.然后,将其保存在数据框中.

我目前的数据框架如下:

                    values
    first second
    bar   one     0.106521
          two     1.964873
    baz   one     1.289683
          two    -0.696361
    foo   one    -0.309505
          two     2.890406
    qux   one    -0.758369
          two     1.302628

Run Code Online (Sandbox Code Playgroud)

并且所需的结果是:

                    values
    first second
    bar   one     0.106521
          two     1.964873
          total   2.071394
    baz   one     1.289683
          two    -0.696361
          total   0.593322
    foo   one    -0.309505
          two     2.890406
          total   2.580901
    qux   one    -0.758369
          two     1.302628
          total   0.544259
    total one     0.328331
          two     5.461546
          total   5.789877

Run Code Online (Sandbox Code Playgroud)

目前我发现下面的实现有效.但我想知道是否有更好的选择.我需要尽可能快的解决方案,因为在某些情况下,当我的数据帧变得庞大时,计算时间似乎需要很长时间.

In [1]: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
   ...:           ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
   ...: 

In [2]: tuples = list(zip(*arrays))

In [3]: index = MultiIndex.from_tuples(tuples, names=['first', 'second'])

In [4]: s = Series(randn(8), index=index)

In [5]: d = {'values': s}

In [6]: df = DataFrame(d)

In [7]: for col in df.index.names:
   .....:     df = df.unstack(col)
   .....:     df[('values', 'total')] = df.sum(axis=1)
   .....:     df = df.stack()
   .....:

Run Code Online (Sandbox Code Playgroud)

Answer 1

CT *_*Zhu 0

相当丑陋的代码：

In [162]:

print df
                values
first second          
bar   one     0.370291
      two     0.750565
baz   one     0.148405
      two     0.919973
foo   one     0.121964
      two     0.394017
qux   one     0.883136
      two     0.871792
In [163]:

print pd.concat((df.reset_index(),
                 df.reset_index().groupby('first').aggregate('sum').reset_index())).\
                      sort(['first','second']).\
                      fillna('total').\
                      set_index(['first','second'])
                values
first second          
bar   one     0.370291
      two     0.750565
      total   1.120856
baz   one     0.148405
      two     0.919973
      total   1.068378
foo   one     0.121964
      two     0.394017
      total   0.515981
qux   one     0.883136
      two     0.871792
      total   1.754927

Run Code Online (Sandbox Code Playgroud)

基本上，由于需要计算附加行“总计”并将其插入到原始数据帧中，因此原始数据和结果之间不会是一对一的关系，也不是多对的关系一种。因此，我认为您必须单独生成“总”数据帧，并将concat其与原始数据帧一起生成。

归档时间：	10 年，9 月前
查看次数：	2207 次
最近记录：	7 年，11 月前