RSH*_*HAP 2 python pivot-table pandas
我有一个这样的数据框:
name = ['fred','fred','fred','james','james','rick','rick','jeff']
actionfigures = ['superman','batman','flash','greenlantern','flash','batman','joker','superman']
cars = ['lamborghini', 'ferrari','bugatti','ferrari','corvette','bugatti','bmw','bmw']
pets = ['cat','dog','bird','cat','dog','dog','fish','marmet']
test = pd.DataFrame({'name':name,'actfig':actionfigures,'car':cars,'pet':pets})
actfig car name pet
0 superman lamborghini fred cat
1 batman ferrari fred dog
2 flash bugatti fred bird
3 greenlantern ferrari james cat
4 flash corvette james dog
5 batman bugatti rick dog
6 joker bmw rick fish
7 superman bmw jeff marmet
Run Code Online (Sandbox Code Playgroud)
如果我的术语不正确,请原谅我,但我想旋转数据,以便获得每个名称的 ['actionfigures','car','pet'] 列中每个值的计数。
batman flash greenlantern joker superman bmw bugatti corvette ferrari lamborghini bird cat dog fish marmet
name
fred 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0
james 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0
jeff 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
rick 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0
Run Code Online (Sandbox Code Playgroud)
我本以为这样test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size'])就可以了,但它给了我一些奇怪的多层列。
我想也许我可以get_dummies对每一列进行连接,然后按名称和总和进行分组,但感觉 pandas prob 有更好的方法。
这将如何完成?
melt和pivot
test.melt('name').assign(new=1).pivot('name','value','new').fillna(0)
Out[239]:
value batman bird bmw bugatti cat corvette dog ferrari fish flash \
name
fred 1.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0
james 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0
jeff 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
rick 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0
value greenlantern joker lamborghini marmet superman
name
fred 0.0 0.0 1.0 0.0 1.0
james 1.0 0.0 0.0 0.0 0.0
jeff 0.0 0.0 0.0 1.0 1.0
rick 0.0 1.0 0.0 0.0 0.0
Run Code Online (Sandbox Code Playgroud)
或者get_dummies
pd.get_dummies(test.set_index('name')).sum(level=0)
Out[248]:
actfig_batman actfig_flash actfig_greenlantern actfig_joker \
name
fred 1 1 0 0
james 0 1 1 0
jeff 0 0 0 0
rick 1 0 0 1
actfig_superman car_bmw car_bugatti car_corvette car_ferrari \
name
fred 1 0 1 0 1
james 0 0 0 1 1
jeff 1 1 0 0 0
rick 0 1 1 0 0
car_lamborghini pet_bird pet_cat pet_dog pet_fish pet_marmet
name
fred 1 1 1 1 0 0
james 0 0 1 1 0 0
jeff 0 0 0 0 0 1
rick 0 0 0 1 1 0
Run Code Online (Sandbox Code Playgroud)
编辑:根据 PiR
pd.get_dummies(test.set_index('name'), prefix_sep='|').sum(level=0).rename(columns=lambda c: c.rsplit('|', 1)[1])
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3082 次 |
| 最近记录: |