pandas - pivot_table与非数字值?(DataError:没有要聚合的数字类型)

Paw*_*ian 10 python pivot-table dataframe pandas

我正在尝试将包含字符串的表作为结果.

import pandas as pd

df1 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': ["on","off","off","on","on","off","off","on"]})

df1.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])
Run Code Online (Sandbox Code Playgroud)

但我明白了:DataError: No numeric types to aggregate.

当我将结果值更改为数字时,这可以正常工作:

df2 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': [1,0,0,1,1,0,0,1]})

df2.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])
Run Code Online (Sandbox Code Playgroud)

我得到了我需要的东西:

variable1   A               B    
variable2   a       b       a   b
variable3   x   y   x   y   x   y
index                            
0           1 NaN NaN NaN NaN NaN
1         NaN NaN   0 NaN NaN NaN
2         NaN NaN NaN NaN   0 NaN
3         NaN NaN NaN NaN NaN   1
4         NaN   1 NaN NaN NaN NaN
5         NaN NaN NaN NaN NaN   0
6         NaN NaN NaN NaN   0 NaN
7         NaN NaN NaN   1 NaN NaN
Run Code Online (Sandbox Code Playgroud)

我知道我可以将字符串映射到数值然后反转操作,但也许有更优雅的解决方案?

Ran*_*win 24

我最初的回复是基于Pandas 0.14.1,从那以后,很多事情在pivot_table函数中发生了变化(rows - > index,cols - > columns ......)

此外,我发布的原始lambda技巧似乎不再适用于Pandas 0.18.您必须提供减少功能(即使它是最小值,最大值或平均值).但即使这样看起来也不合适 - 因为我们并没有减少数据集,只是改变了它......所以我看起来更加困难......

import pandas as pd

df1 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': ["on","off","off","on","on","off","off","on"]})

# these are the columns to end up in the multi-index columns.
unstack_cols = ['variable1', 'variable2', 'variable3']
Run Code Online (Sandbox Code Playgroud)

首先,使用索引+要堆叠的列设置数据的索引,然后使用级别arg调用unstack.

df1.set_index(['index'] + unstack_cols).unstack(level=unstack_cols)
Run Code Online (Sandbox Code Playgroud)

结果数据框如下.

在此输入图像描述