Pandas - 旋转多个分类列

RSH*_*HAP 2 python pivot-table pandas

我有一个这样的数据框:

name = ['fred','fred','fred','james','james','rick','rick','jeff']
actionfigures = ['superman','batman','flash','greenlantern','flash','batman','joker','superman']
cars = ['lamborghini', 'ferrari','bugatti','ferrari','corvette','bugatti','bmw','bmw']
pets = ['cat','dog','bird','cat','dog','dog','fish','marmet']

test = pd.DataFrame({'name':name,'actfig':actionfigures,'car':cars,'pet':pets})

    actfig       car                name    pet
0   superman     lamborghini        fred    cat
1   batman       ferrari            fred    dog
2   flash        bugatti            fred    bird
3   greenlantern ferrari            james   cat
4   flash        corvette           james   dog
5   batman       bugatti            rick    dog
6   joker        bmw                rick    fish
7   superman     bmw                jeff    marmet
Run Code Online (Sandbox Code Playgroud)

如果我的术语不正确,请原谅我,但我想旋转数据,以便获得每个名称的 ['actionfigures','car','pet'] 列中每个值的计数。

    batman  flash   greenlantern    joker   superman    bmw bugatti corvette    ferrari lamborghini bird    cat dog fish    marmet
name                                                            
fred    1   1   0   0   1   0   1   0   1   1   1   1   1   0   0
james   0   1   1   0   0   0   0   1   1   0   0   1   1   0   0
jeff    0   0   0   0   1   1   0   0   0   0   0   0   0   0   1
rick    1   0   0   1   0   1   1   0   0   0   0   0   1   1   0
Run Code Online (Sandbox Code Playgroud)

我本以为这样test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size'])就可以了,但它给了我一些奇怪的多层列。

我想也许我可以get_dummies对每一列进行连接,然后按名称和总和进行分组,但感觉 pandas prob 有更好的方法。

这将如何完成?

WeN*_*Ben 5

meltpivot

test.melt('name').assign(new=1).pivot('name','value','new').fillna(0)
Out[239]: 
value  batman  bird  bmw  bugatti  cat  corvette  dog  ferrari  fish  flash  \
name                                                                          
fred      1.0   1.0  0.0      1.0  1.0       0.0  1.0      1.0   0.0    1.0   
james     0.0   0.0  0.0      0.0  1.0       1.0  1.0      1.0   0.0    1.0   
jeff      0.0   0.0  1.0      0.0  0.0       0.0  0.0      0.0   0.0    0.0   
rick      1.0   0.0  1.0      1.0  0.0       0.0  1.0      0.0   1.0    0.0   
value  greenlantern  joker  lamborghini  marmet  superman  
name                                                       
fred            0.0    0.0          1.0     0.0       1.0  
james           1.0    0.0          0.0     0.0       0.0  
jeff            0.0    0.0          0.0     1.0       1.0  
rick            0.0    1.0          0.0     0.0       0.0  
Run Code Online (Sandbox Code Playgroud)

或者get_dummies

pd.get_dummies(test.set_index('name')).sum(level=0)
Out[248]: 
       actfig_batman  actfig_flash  actfig_greenlantern  actfig_joker  \
name                                                                    
fred               1             1                    0             0   
james              0             1                    1             0   
jeff               0             0                    0             0   
rick               1             0                    0             1   
       actfig_superman  car_bmw  car_bugatti  car_corvette  car_ferrari  \
name                                                                      
fred                 1        0            1             0            1   
james                0        0            0             1            1   
jeff                 1        1            0             0            0   
rick                 0        1            1             0            0   
       car_lamborghini  pet_bird  pet_cat  pet_dog  pet_fish  pet_marmet  
name                                                                      
fred                 1         1        1        1         0           0  
james                0         0        1        1         0           0  
jeff                 0         0        0        0         0           1  
rick                 0         0        0        1         1           0
Run Code Online (Sandbox Code Playgroud)

编辑:根据 PiR

pd.get_dummies(test.set_index('name'), prefix_sep='|').sum(level=0).rename(columns=lambda c: c.rsplit('|', 1)[1]) 
Run Code Online (Sandbox Code Playgroud)