熊猫数据透视表值作为列或索引

Question

熊猫数据透视表值作为列或索引

如何将与“值”中使用的列相同的列用于“列”或“索引”？

例如：

pd.pivot_table(data, values='Survived', index=['Survived', 'Sex', 'Pclass'],
               aggfunc=len, margins=True)

Run Code Online (Sandbox Code Playgroud)

values 和 index 使用同一列 Survived。当我尝试运行上面的我得到

ValueError: Grouper for 'Survived' not 1-dimensional

Run Code Online (Sandbox Code Playgroud)

但是，如果我使用另一列代替 values='Survived'，则 pivot_table 工作正常。

Answer 1

Rog*_*ien 4

columns我看到的一个问题是您在调用时没有设置参数pivot_table（它告诉 pandas 使用哪些值作为输出的列标题pivot_table）。

数据透视表操作实际上是一系列groupby -> aggregate -> unstack. 假设你有这个DataFrame：

    survived sex pclass  other
0      False   f      a     29
1       True   f      b      6
2       True   f      b     22
3      False   m      b     55
4      False   f      a     59
..       ...  ..    ...    ...
95     False   f      a     66
96     False   f      c     42
97      True   m      c     93
98      True   m      c     59
99     False   f      b     93

Run Code Online (Sandbox Code Playgroud)

您可以使用以下方法对该表进行透视pivot_table：

pd.pivot_table(df, index='sex', columns='pclass', values='other', aggfunc=sum)

Run Code Online (Sandbox Code Playgroud)

pclass     a    b     c
sex                    
f       1000  840   306
m        728  851  1247

Run Code Online (Sandbox Code Playgroud)

groupby或者您可以使用and获得相同的结果unstack：

df.groupby(['sex', 'pclass'])['other'].sum().unstack()

Run Code Online (Sandbox Code Playgroud)

pclass     a    b     c
sex                    
f       1000  840   306
m        728  851  1247

Run Code Online (Sandbox Code Playgroud)

这个小故事的要点是数据透视表实际上是groupby操作。在您的情况下，您尝试使用进行分组并再次['Survived', 'Sex', 'Pclass']聚合。这没有多大意义，因为它已经是输出表索引的一部分（这就是为什么会给你一个错误）。'Survived'len'Survived'pivot_table

如果你真的想让这个工作，你可以使用groupby：

df.groupby(['survived', 'sex', 'pclass', 'other']['survived'].apply(len).unstack()

Run Code Online (Sandbox Code Playgroud)

然而，我认为你实际上想要实现其他目标，但不确定是什么。

归档时间：	9 年，10 月前
查看次数：	2839 次
最近记录：	5 年，2 月前