除了pandas中的一个索引列以外的所有内容进行分组

Question

除了pandas中的一个索引列以外的所有内容进行分组

我的数据分析反复出现在一个简单但不确定的主题上,即"除了一切之外的一切".拿这个多索引的例子,df:

                      accuracy  velocity
name condition trial                    
john a         1     -1.403105  0.419850
               2     -0.879487  0.141615
     b         1      0.880945  1.951347
               2      0.103741  0.015548
hans a         1      1.425816  2.556959
               2     -0.117703  0.595807
     b         1     -1.136137  0.001417
               2      0.082444 -1.184703

Run Code Online (Sandbox Code Playgroud)

例如,我现在要做的是对所有可用试验进行平均,同时保留有关名称和条件的信息.这很容易实现:

average = df.groupby(level=('name', 'condition')).mean()

Run Code Online (Sandbox Code Playgroud)

然而,在现实条件下,多索引中存储的元数据要多得多.该指数每行容易跨越8-10列.所以上面的模式变得非常笨拙.最终,我正在寻找一个"丢弃"操作; 我想执行抛出或减少单个索引列的操作.在上面的情况下,这是试用号码.

我应该咬紧牙关还是有更惯用的方式来解决这个问题？这可能是反模式!当谈到"真正的熊猫方式"时,我想建立一个体面的直觉...在此先感谢.

Answer 1

unu*_*tbu 7

您可以为此定义一个辅助函数:

def allbut(*names):
    names = set(names)
    return [item for item in levels if item not in names]

Run Code Online (Sandbox Code Playgroud)

演示:

import pandas as pd
levels = ('name', 'condition', 'trial')
names = ('john', 'hans')
conditions = list('ab')
trials = range(1, 3)

idx = pd.MultiIndex.from_product(
    [names, conditions, trials], names=levels)

df = pd.DataFrame(np.random.randn(len(idx), 2),
                      index=idx, columns=('accuracy', 'velocity'))

def allbut(*names):
    names = set(names)
    return [item for item in levels if item not in names]

Run Code Online (Sandbox Code Playgroud)

In [40]: df.groupby(level=allbut('condition')).mean()
Out[40]: 
            accuracy  velocity
trial name                    
1     hans  0.086303  0.131395
      john  0.454824 -0.259495
2     hans -0.234961 -0.626495
      john  0.614730 -0.144183

Run Code Online (Sandbox Code Playgroud)

您也可以删除多个级别:

In [53]: df.groupby(level=allbut('name', 'trial')).mean()
Out[53]: 
           accuracy  velocity
condition                    
a         -0.597178 -0.370377
b         -0.126996 -0.037003

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，2 月前
查看次数：	2686 次
最近记录：	11 年，2 月前