Pandas`DataFrameGroupBy`和`SeriesGroupBy`

Zhu*_*arb 4 python group-by pandas

我承认我不是一个Python大师,但我仍然觉得处理Pandas DataFrameGroupBySeriesGroupBy对象异常违反直觉.(我有一个R背景.)

我有以下数据框:

import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
                   'code' : ['one', 'one', 'two', 'three',
                             'two', 'three', 'one', 'two'],
                   'colour': ['black', 'white','white','white',
                           'black', 'black', 'white', 'white'],
                   'irrelevant1': ['foo', 'foo', 'foo','bar','bar',
                                     'foo','bar','bar'],
                   'irrelevant2': ['foo', 'foo', 'foo','bar','bar',
                                     'foo','bar','bar'],
                   'irrelevant3': ['foo', 'foo', 'foo','bar','bar',
                                     'foo','bar','bar'],
                   'amount' : np.random.randn(8)},  columns= ['id','code','colour', 'irrelevant1', 'irrelevant2', 'irrelevant3', 'amount'])
Run Code Online (Sandbox Code Playgroud)

我希望能够按照和id分组.下面的代码进行分组,但保留所有列.codecolour

gb = df.groupby(['code','colour'])
gb.head(5)
                id   code colour irrelevant1 irrelevant2 irrelevant3    amount
code  colour                                                                  
one   black  0   1    one  black         foo         foo         foo -0.644170
      white  1   2    one  white         foo         foo         foo  0.912372
             6   7    one  white         bar         bar         bar  0.530575
three black  5   6  three  black         foo         foo         foo -0.123806
      white  3   4  three  white         bar         bar         bar -0.387080
two   black  4   5    two  black         bar         bar         bar -0.578107
      white  2   3    two  white         foo         foo         foo  0.768637
             7   8    two  white         bar         bar         bar -0.282577
Run Code Online (Sandbox Code Playgroud)

问题:

1)gb,我如何只存储id列(甚至没有任何索引)并摆脱其余的?

2)一旦我有了所需的DataFrameGroupBy gb,我如何访问id{code = one和color = white}的情况?我尝试过gb.get_group('one','white'),gb.get_group(['one','white'])但他们不工作.

3)如何访问{color = white},即缺少code索引的条目?

4)最后,手册不是很有帮助,您是否知道有哪些来源可以创建和访问这些分组对象?

Tom*_*ger 7

对于你的问题,你甚至不需要执行groupby(但你应该在散文文档中阅读更多关于它的内容).

一个更好的解决方案是MultiIndex:

In [36]: df = df.set_index(['code', 'colour']).sort_index()

In [37]: df
Out[37]: 
              id irrelevant1 irrelevant2 irrelevant3    amount
code  colour                                                  
one   black    1         foo         foo         foo  0.103045
      white    2         foo         foo         foo  0.751824
      white    7         bar         bar         bar -1.275114
three black    6         foo         foo         foo  0.311305
      white    4         bar         bar         bar -0.416722
two   black    5         bar         bar         bar  1.534859
      white    3         foo         foo         foo -1.068399
      white    8         bar         bar         bar -0.243893

[8 rows x 5 columns]
Run Code Online (Sandbox Code Playgroud)

这照顾1.

2:使用熟悉的切片语法:

In [38]: df.loc['one', 'white']
Out[38]: 
             id irrelevant1 irrelevant2 irrelevant3    amount
code colour                                                  
one  white    2         foo         foo         foo  0.751824
     white    7         bar         bar         bar -1.275114

[2 rows x 5 columns]
Run Code Online (Sandbox Code Playgroud)

3:这是一个横截面,用途.xs:

In [39]: df.xs('white', level='colour')
Out[39]: 
       id irrelevant1 irrelevant2 irrelevant3    amount
code                                                   
one     2         foo         foo         foo  0.751824
one     7         bar         bar         bar -1.275114
three   4         bar         bar         bar -0.416722
two     3         foo         foo         foo -1.068399
two     8         bar         bar         bar -0.243893

[5 rows x 5 columns]
Run Code Online (Sandbox Code Playgroud)

4:各地都有例子.这里检查大熊猫/ GROUPBY标签,对文档的部分被上工作,现在,上面链接的散文文档.