如何在 Pandas 中使用非数字数据制作数据透视表?

CCo*_*der 1 python pandas

我有一个数据框如下

name    window_num  channel
----------------------------
Alice   1           cnn
Bob     2           fox
Alice   3           msnbc
Run Code Online (Sandbox Code Playgroud)

我希望数据采用以下格式。

name    1       2       3 
------------------------------
Alice   cnn     nan     msnbc           
Bob     nan     fox     nan
Run Code Online (Sandbox Code Playgroud)

我尝试过熊猫pivot_table方法。

df.pivot_table(index=['name'],columns=['window_num'],values=['channel'])

但这期望值列是数字以进行聚合。

jez*_*ael 5

如果所有值都是必需的并且可能重复name, window_num使用join函数:

print (df)
    name  window_num channel
0  Alice           1     cnn <- duplicates name, window_num
1  Alice           1   msnbc <- duplicates name, window_num
2    Bob           2     fox
3  Alice           3   msnbc


df1 = df.pivot_table(index='name',columns='window_num',values='channel', aggfunc=','.join)
print (df1)
window_num          1    2      3
name                             
Alice       cnn,msnbc  NaN  msnbc <- joined data
Bob               NaN  fox    NaN
Run Code Online (Sandbox Code Playgroud)

如果只需要第一个/最后一个值,那么如果删除重复的下一个值:

df2 = df.pivot_table(index='name',columns='window_num',values='channel', aggfunc='first')
print (df2)
window_num    1    2      3
name                       
Alice       cnn  NaN  msnbc <- first value, duplicated is lost
Bob         NaN  fox    NaN


df3 = df.pivot_table(index='name',columns='window_num',values='channel', aggfunc='last')
print (df3)
window_num      1    2      3
name                         
Alice       msnbc  NaN  msnbc <- las tvalue, duplacated is lost
Bob           NaN  fox    NaN
Run Code Online (Sandbox Code Playgroud)

如果确定没有重复项,请使用DataFrame.pivot

df.pivot(index='name',columns='window_num',values='channel')
Run Code Online (Sandbox Code Playgroud)

如果不确定重复是否pivot失败,如果重复name, window_num

print (df)
    name  window_num channel
0  Alice           1     cnn
1  Alice           1   msnbc
2    Bob           2     fox
3  Alice           3   msnbc

df4 = df.pivot(index='name',columns='window_num',values='channel')
print (df4)

>ValueError: Index contains duplicate entries, cannot reshape
Run Code Online (Sandbox Code Playgroud)