Sha*_*ang 3 python dataframe pandas
我有以下熊猫数据框:
import pandas as pd
import numpy as np
df = pd.DataFrame({"shops": ["shop1", "shop2", "shop3", "shop4", "shop5", "shop6"], "franchise" : ["franchise_A", "franchise_A", "franchise_A", "franchise_A", "franchise_B", "franchise_B"],"items" : ["dog", "cat", "dog", "dog", "bird", "fish"]})
df = df[["shops", "franchise", "items"]]
print(df)
shops franchise items
0 shop1 franchise_A dog
1 shop2 franchise_A cat
2 shop3 franchise_A dog
3 shop4 franchise_A dog
4 shop5 franchise_B bird
5 shop6 franchise_B fish
Run Code Online (Sandbox Code Playgroud)
这样,每一行是一个独特的样品shop1,shop2等,由此每个样品属于一个子组franchise_A,franchise_B,franchise_C等。在items柱中,只有四个分类值可能是:dog,cat,fish,bird。我的动机是要创建的数量的barplot dog,cat,fish,bird每个“特许经营”。
我希望输出是
franchise dogs cats birds fish
franchise_A 3 1 0 0
franchise_B 0 0 1 1
Run Code Online (Sandbox Code Playgroud)
我相信我首先必须使用groupby(),例如
df.groupby("franchise").count()
shops items
franchise
franchise_A 4 4
franchise_B 2 2
Run Code Online (Sandbox Code Playgroud)
但是我不确定如何计算每个专营权的商品数量。
你可以用value_counts与unstack感谢Nickil Maveli:
from collections import Counter
print (df.groupby("franchise")['items'].value_counts().unstack(fill_value=0))
items bird cat dog fish
franchise
franchise_A 0 1 3 0
franchise_B 1 0 0 1
Run Code Online (Sandbox Code Playgroud)
与其他解决方案crosstab和pivot_table:
print (pd.crosstab(df["franchise"], df['items']))
items bird cat dog fish
franchise
franchise_A 0 1 3 0
franchise_B 1 0 0 1
Run Code Online (Sandbox Code Playgroud)
print (df.pivot_table(index="franchise", columns='items', aggfunc='size', fill_value=0))
items bird cat dog fish
franchise
franchise_A 0 1 3 0
franchise_B 1 0 0 1
Run Code Online (Sandbox Code Playgroud)