如何计算熊猫数据框中的分类数据子组?

Sha*_*ang 3 python dataframe pandas

我有以下熊猫数据框:

import pandas as pd
import numpy as np
df = pd.DataFrame({"shops": ["shop1", "shop2", "shop3", "shop4", "shop5", "shop6"], "franchise" : ["franchise_A", "franchise_A", "franchise_A", "franchise_A", "franchise_B", "franchise_B"],"items" : ["dog", "cat", "dog", "dog", "bird", "fish"]})
df = df[["shops", "franchise", "items"]]
print(df)

   shops    franchise items
0  shop1  franchise_A   dog
1  shop2  franchise_A   cat
2  shop3  franchise_A   dog
3  shop4  franchise_A   dog
4  shop5  franchise_B  bird
5  shop6  franchise_B  fish
Run Code Online (Sandbox Code Playgroud)

这样,每一行是一个独特的样品shop1shop2等,由此每个样品属于一个子组franchise_Afranchise_Bfranchise_C等。在items柱中,只有四个分类值可能是:dogcatfishbird。我的动机是要创建的数量的barplot dogcatfishbird每个“特许经营”。

我希望输出是

franchise        dogs    cats    birds    fish
franchise_A      3       1       0        0
franchise_B      0       0       1        1
Run Code Online (Sandbox Code Playgroud)

我相信我首先必须使用groupby(),例如

df.groupby("franchise").count()
             shops  items
franchise                
franchise_A      4      4
franchise_B      2      2
Run Code Online (Sandbox Code Playgroud)

但是我不确定如何计算每个专营权的商品数量。

jez*_*ael 5

你可以用value_countsunstack感谢Nickil Maveli

from collections import Counter

print (df.groupby("franchise")['items'].value_counts().unstack(fill_value=0))
items        bird  cat  dog  fish
franchise                        
franchise_A     0    1    3     0
franchise_B     1    0    0     1
Run Code Online (Sandbox Code Playgroud)

与其他解决方案crosstabpivot_table

print (pd.crosstab(df["franchise"], df['items']))
items        bird  cat  dog  fish
franchise                        
franchise_A     0    1    3     0
franchise_B     1    0    0     1
Run Code Online (Sandbox Code Playgroud)
print (df.pivot_table(index="franchise", columns='items', aggfunc='size', fill_value=0))
items        bird  cat  dog  fish
franchise                        
franchise_A     0    1    3     0
franchise_B     1    0    0     1
Run Code Online (Sandbox Code Playgroud)

  • 实际上,用value_counts()而不是Counter可以使整个过程变得更加牢固。 (2认同)