我想计算每个多指数子级的总和.然后,将其保存在数据框中.
我目前的数据框架如下:
values
first second
bar one 0.106521
two 1.964873
baz one 1.289683
two -0.696361
foo one -0.309505
two 2.890406
qux one -0.758369
two 1.302628
Run Code Online (Sandbox Code Playgroud)
并且所需的结果是:
values
first second
bar one 0.106521
two 1.964873
total 2.071394
baz one 1.289683
two -0.696361
total 0.593322
foo one -0.309505
two 2.890406
total 2.580901
qux one -0.758369
two 1.302628
total 0.544259
total one 0.328331
two 5.461546
total 5.789877
Run Code Online (Sandbox Code Playgroud)
目前我发现下面的实现有效.但我想知道是否有更好的选择.我需要尽可能快的解决方案,因为在某些情况下,当我的数据帧变得庞大时,计算时间似乎需要很长时间.
In [1]: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
...: ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
...:
In [2]: tuples = list(zip(*arrays))
In [3]: index = MultiIndex.from_tuples(tuples, names=['first', 'second'])
In [4]: s = Series(randn(8), index=index)
In [5]: d = {'values': s}
In [6]: df = DataFrame(d)
In [7]: for col in df.index.names:
.....: df = df.unstack(col)
.....: df[('values', 'total')] = df.sum(axis=1)
.....: df = df.stack()
.....:
Run Code Online (Sandbox Code Playgroud)
相当丑陋的代码:
In [162]:
print df
values
first second
bar one 0.370291
two 0.750565
baz one 0.148405
two 0.919973
foo one 0.121964
two 0.394017
qux one 0.883136
two 0.871792
In [163]:
print pd.concat((df.reset_index(),
df.reset_index().groupby('first').aggregate('sum').reset_index())).\
sort(['first','second']).\
fillna('total').\
set_index(['first','second'])
values
first second
bar one 0.370291
two 0.750565
total 1.120856
baz one 0.148405
two 0.919973
total 1.068378
foo one 0.121964
two 0.394017
total 0.515981
qux one 0.883136
two 0.871792
total 1.754927
Run Code Online (Sandbox Code Playgroud)
基本上,由于需要计算附加行“总计”并将其插入到原始数据帧中,因此原始数据和结果之间不会是一对一的关系,也不是多对的关系一种。因此,我认为您必须单独生成“总”数据帧,并将concat其与原始数据帧一起生成。