如何获得pandas的groupby命令以返回DataFrame而不是Series?

use*_*262 3 python pandas

我不了解熊猫的groupby的输出。我从一个df0具有5个字段/列(邮政编码,城市,位置,人口,州)的DataFrame()开始。

 >>> df0.info()
 <class 'pandas.core.frame.DataFrame'>
 RangeIndex: 29467 entries, 0 to 29466
 Data columns (total 5 columns):
 zip      29467 non-null object
 city     29467 non-null object
 loc      29467 non-null object
 pop      29467 non-null int64
 state    29467 non-null object
 dtypes: int64(1), object(4)
 memory usage: 1.1+ MB
Run Code Online (Sandbox Code Playgroud)

我想获取每个城市的总人口,但是由于几个城市有多个邮政编码,所以我想使用groupby.sum如下:

  df6 = df0.groupby(['city','state'])['pop'].sum()
Run Code Online (Sandbox Code Playgroud)

但是,这返回了Series而不是DataFrame:

 >>> df6.info()
 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 2672, in __getattr__
     return object.__getattribute__(self, name)
  AttributeError: 'Series' object has no attribute 'info'
 >>> type(df6)
 <class 'pandas.core.series.Series'>
Run Code Online (Sandbox Code Playgroud)

我希望能够使用类似的方法查询任何城市的人口

 df0[df0['city'].isin(['ALBANY'])]
Run Code Online (Sandbox Code Playgroud)

但是由于我有一个Series而不是一个DataFrame,所以我不能。我也无法强制转换为DataFrame。

我现在想知道的是:

  1. 为什么没有找回DataFrame而不是Series?
  2. 我如何获得一张可以查询城市人口的桌子?我可以使用从groupby获得的系列,还是应该采用其他方法?

jez*_*ael 5

as_index=Falsegroupbyreset_index转换MultiIndex为列时需要参数:

df6 = df0.groupby(['city','state'], as_index=False)['pop'].sum()
Run Code Online (Sandbox Code Playgroud)

要么:

df6 = df0.groupby(['city','state'])['pop'].sum().reset_index()
Run Code Online (Sandbox Code Playgroud)

样品:

df0 = pd.DataFrame({'city':['a','a','b'],
                   'state':['t','t','n'],
                   'pop':[7,8,9]})

print (df0)
  city  pop state
0    a    7     t
1    a    8     t
2    b    9     n

df6 = df0.groupby(['city','state'], as_index=False)['pop'].sum()
print (df6)
  city state  pop
0    a     t   15
1    b     n    9
Run Code Online (Sandbox Code Playgroud)
df6 = df0.groupby(['city','state'])['pop'].sum().reset_index()
print (df6)
  city state  pop
0    a     t   15
1    b     n    9
Run Code Online (Sandbox Code Playgroud)

最后选择loc,用于标量添加item()

print (df6.loc[df6.state == 't', 'pop'])
0    15
Name: pop, dtype: int64

print (df6.loc[df6.state == 't', 'pop'].item())
15
Run Code Online (Sandbox Code Playgroud)

但是如果仅需要查找表,则可以Series与结合使用MultiIndex

s = df0.groupby(['city','state'])['pop'].sum()
print (s)
city  state
a     t        15
b     n         9
Name: pop, dtype: int64

#select all cities by : and state by string like 't'
#output is Series of len 1
print (s.loc[:, 't'])
city
a    15
Name: pop, dtype: int64

#if need output as scalar add item()
print (s.loc[:, 't'].item())
15
Run Code Online (Sandbox Code Playgroud)