我有一个数据框如下
name window_num channel
----------------------------
Alice 1 cnn
Bob 2 fox
Alice 3 msnbc
Run Code Online (Sandbox Code Playgroud)
我希望数据采用以下格式。
name 1 2 3
------------------------------
Alice cnn nan msnbc
Bob nan fox nan
Run Code Online (Sandbox Code Playgroud)
我尝试过熊猫pivot_table方法。
df.pivot_table(index=['name'],columns=['window_num'],values=['channel'])
但这期望值列是数字以进行聚合。
如果所有值都是必需的并且可能重复name, window_num使用join函数:
print (df)
name window_num channel
0 Alice 1 cnn <- duplicates name, window_num
1 Alice 1 msnbc <- duplicates name, window_num
2 Bob 2 fox
3 Alice 3 msnbc
df1 = df.pivot_table(index='name',columns='window_num',values='channel', aggfunc=','.join)
print (df1)
window_num 1 2 3
name
Alice cnn,msnbc NaN msnbc <- joined data
Bob NaN fox NaN
Run Code Online (Sandbox Code Playgroud)
如果只需要第一个/最后一个值,那么如果删除重复的下一个值:
df2 = df.pivot_table(index='name',columns='window_num',values='channel', aggfunc='first')
print (df2)
window_num 1 2 3
name
Alice cnn NaN msnbc <- first value, duplicated is lost
Bob NaN fox NaN
df3 = df.pivot_table(index='name',columns='window_num',values='channel', aggfunc='last')
print (df3)
window_num 1 2 3
name
Alice msnbc NaN msnbc <- las tvalue, duplacated is lost
Bob NaN fox NaN
Run Code Online (Sandbox Code Playgroud)
如果确定没有重复项,请使用DataFrame.pivot:
df.pivot(index='name',columns='window_num',values='channel')
Run Code Online (Sandbox Code Playgroud)
如果不确定重复是否pivot失败,如果重复name, window_num:
print (df)
name window_num channel
0 Alice 1 cnn
1 Alice 1 msnbc
2 Bob 2 fox
3 Alice 3 msnbc
df4 = df.pivot(index='name',columns='window_num',values='channel')
print (df4)
>ValueError: Index contains duplicate entries, cannot reshape
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3890 次 |
| 最近记录: |