Groupby 和 Pivot Pandas 表

Question

Groupby 和 Pivot Pandas 表

这应该很快，但是我正在做的枢轴/分组工作都没有达到我的需要。

我有一个这样的表：

        Letter  Period  Amount
YrMnth
2014-12      B       6       0
2014-12      C       8       1
2014-12      C       9       2
2014-12      C      10       3
2014-12      C       6       4
2014-12      C      12       5
2014-12      C       7       6
2014-12      C      11       7
2014-12      D       9       8
2014-12      D      10       9
2014-12      D       1      10
2014-12      D       8      11
2014-12      D       6      12
2014-12      D      12      13
2014-12      D       7      14
2014-12      D      11      15
2014-12      D       4      16
2014-12      D       3      17
2015-01      B       7      18
2015-01      B       8      19
2015-01      B       1      20
2015-01      B      10      21
2015-01      B      11      22
2015-01      B       6      23
2015-01      B       9      24
2015-01      B       3      25
2015-01      B       5      26
2015-01      C      10      27

Run Code Online (Sandbox Code Playgroud)

我想对其进行透视，以便索引基本上是 YrMonth 和 Letter，Period 是列，Amount 是值。

我总体上了解数据透视表，但当我尝试使用多个索引进行数据透视表时会出现错误。我将索引设为一列，并尝试了以下操作：

In [76]: df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')

Run Code Online (Sandbox Code Playgroud)

但我出现了这个错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-76-fc2a4c5f244d> in <module>()
----> 1 df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in pivot(self, index, columns, values)
   3761         """
   3762         from pandas.core.reshape import pivot
-> 3763         return pivot(self, index=index, columns=columns, values=values)
   3764
   3765     def stack(self, level=-1, dropna=True):

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/reshape.pyc in pivot(self, index, columns, values)
    331         indexed = Series(self[values].values,
    332                          index=MultiIndex.from_arrays([index,
--> 333                                                        self[columns]]))
    334         return indexed.unstack(columns)
    335

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath)
    225                                        raise_cast_failure=True)
    226
--> 227                 data = SingleBlockManager(data, index, fastpath=True)
    228
    229         generic.NDFrame.__init__(self, data, fastpath=True)

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, block, axis, do_integrity_check, fastpath)
   3734             block = make_block(block,
   3735                                placement=slice(0, len(axis)),
-> 3736                                ndim=1, fastpath=True)
   3737
   3738         self.blocks = [block]

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in make_block(values, placement, klass, ndim, dtype, fastpath)
   2452
   2453     return klass(values, ndim=ndim, fastpath=fastpath,
-> 2454                  placement=placement)
   2455
   2456

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, values, placement, ndim, fastpath)
     85             raise ValueError('Wrong number of items passed %d,'
     86                              ' placement implies %d' % (
---> 87                                  len(self.values), len(self.mgr_locs)))
     88
     89     @property

ValueError: Wrong number of items passed 138, placement implies 2

Run Code Online (Sandbox Code Playgroud)

Answer 1

Pad*_*ham 4

如果我理解正确的话，pivot_table可能更接近您的需要：

df = df.pivot_table(index=["YrMnth", "Letter"], columns="Period", values="Amount")

Run Code Online (Sandbox Code Playgroud)

这给你：

Period          1   3   4   5   6   7   8   9   10  11  12
YrMnth  Letter                                            
2014-12 B      NaN NaN NaN NaN   0 NaN NaN NaN NaN NaN NaN
        C      NaN NaN NaN NaN   4   6   1   2   3   7   5
        D       10  17  16 NaN  12  14  11   8   9  15  13
2015-01 B       20  25 NaN  26  23  18  19  24  21  22 NaN
        C      NaN NaN NaN NaN NaN NaN NaN NaN  27 NaN NaN

Run Code Online (Sandbox Code Playgroud)

正如评论中所建议的：

 df = pd.pivot_table(df, index=["YrMnth", "Letter"], columns="Period", values="Amount")


Period          1   3   4   5   6   7   8   9   10  11  12
YrMnth  Letter                                            
2014-12 B      NaN NaN NaN NaN   0 NaN NaN NaN NaN NaN NaN
        C      NaN NaN NaN NaN   4   6   1   2   3   7   5
        D       10  17  16 NaN  12  14  11   8   9  15  13
2015-01 B       20  25 NaN  26  23  18  19  24  21  22 NaN
        C      NaN NaN NaN NaN NaN NaN NaN NaN  27 NaN NaN

Run Code Online (Sandbox Code Playgroud)

也产生相同的结果，如果有人想澄清前者将如何失败，那就太好了。

归档时间：	9 年，11 月前
查看次数：	2550 次
最近记录：	9 年，11 月前