这应该很快,但是我正在做的枢轴/分组工作都没有达到我的需要。
我有一个这样的表:
Letter Period Amount
YrMnth
2014-12 B 6 0
2014-12 C 8 1
2014-12 C 9 2
2014-12 C 10 3
2014-12 C 6 4
2014-12 C 12 5
2014-12 C 7 6
2014-12 C 11 7
2014-12 D 9 8
2014-12 D 10 9
2014-12 D 1 10
2014-12 D 8 11
2014-12 D 6 12
2014-12 D 12 13
2014-12 D 7 14
2014-12 D 11 15
2014-12 D 4 16
2014-12 D 3 17
2015-01 B 7 18
2015-01 B 8 19
2015-01 B 1 20
2015-01 B 10 21
2015-01 B 11 22
2015-01 B 6 23
2015-01 B 9 24
2015-01 B 3 25
2015-01 B 5 26
2015-01 C 10 27
Run Code Online (Sandbox Code Playgroud)
我想对其进行透视,以便索引基本上是 YrMonth 和 Letter,Period 是列,Amount 是值。
我总体上了解数据透视表,但当我尝试使用多个索引进行数据透视表时会出现错误。我将索引设为一列,并尝试了以下操作:
In [76]: df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')
Run Code Online (Sandbox Code Playgroud)
但我出现了这个错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-76-fc2a4c5f244d> in <module>()
----> 1 df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')
/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in pivot(self, index, columns, values)
3761 """
3762 from pandas.core.reshape import pivot
-> 3763 return pivot(self, index=index, columns=columns, values=values)
3764
3765 def stack(self, level=-1, dropna=True):
/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/reshape.pyc in pivot(self, index, columns, values)
331 indexed = Series(self[values].values,
332 index=MultiIndex.from_arrays([index,
--> 333 self[columns]]))
334 return indexed.unstack(columns)
335
/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath)
225 raise_cast_failure=True)
226
--> 227 data = SingleBlockManager(data, index, fastpath=True)
228
229 generic.NDFrame.__init__(self, data, fastpath=True)
/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, block, axis, do_integrity_check, fastpath)
3734 block = make_block(block,
3735 placement=slice(0, len(axis)),
-> 3736 ndim=1, fastpath=True)
3737
3738 self.blocks = [block]
/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in make_block(values, placement, klass, ndim, dtype, fastpath)
2452
2453 return klass(values, ndim=ndim, fastpath=fastpath,
-> 2454 placement=placement)
2455
2456
/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, values, placement, ndim, fastpath)
85 raise ValueError('Wrong number of items passed %d,'
86 ' placement implies %d' % (
---> 87 len(self.values), len(self.mgr_locs)))
88
89 @property
ValueError: Wrong number of items passed 138, placement implies 2
Run Code Online (Sandbox Code Playgroud)
如果我理解正确的话,pivot_table可能更接近您的需要:
df = df.pivot_table(index=["YrMnth", "Letter"], columns="Period", values="Amount")
Run Code Online (Sandbox Code Playgroud)
这给你:
Period 1 3 4 5 6 7 8 9 10 11 12
YrMnth Letter
2014-12 B NaN NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN
C NaN NaN NaN NaN 4 6 1 2 3 7 5
D 10 17 16 NaN 12 14 11 8 9 15 13
2015-01 B 20 25 NaN 26 23 18 19 24 21 22 NaN
C NaN NaN NaN NaN NaN NaN NaN NaN 27 NaN NaN
Run Code Online (Sandbox Code Playgroud)
正如评论中所建议的:
df = pd.pivot_table(df, index=["YrMnth", "Letter"], columns="Period", values="Amount")
Period 1 3 4 5 6 7 8 9 10 11 12
YrMnth Letter
2014-12 B NaN NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN
C NaN NaN NaN NaN 4 6 1 2 3 7 5
D 10 17 16 NaN 12 14 11 8 9 15 13
2015-01 B 20 25 NaN 26 23 18 19 24 21 22 NaN
C NaN NaN NaN NaN NaN NaN NaN NaN 27 NaN NaN
Run Code Online (Sandbox Code Playgroud)
也产生相同的结果,如果有人想澄清前者将如何失败,那就太好了。
| 归档时间: |
|
| 查看次数: |
2550 次 |
| 最近记录: |