sud*_*nym 4 python group-by dataframe pandas pandas-groupby
我正在处理一个df1带有物品价格的熊猫数据框.
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
Run Code Online (Sandbox Code Playgroud)
我创建Minimum使用:
df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(min)
Run Code Online (Sandbox Code Playgroud)
我该如何创作Most_Common_Price?
df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(value_counts()) # Doesn't work
Run Code Online (Sandbox Code Playgroud)
目前,我采用了多步骤方法:
for item in df1.Item.unique().tolist(): # Pseudocode
df1 = df1[df1.Price == Item] # Pseudocode
df1.Price.value_counts().max() # Pseudocode
Run Code Online (Sandbox Code Playgroud)
这太过分了.必须有一种更简单的方法,理想情况是一行
如何在pandas中将groupby().transform()转换为value_counts()?
一个不错的方法是使用pd.Series.mode, 如果您想要最常见的元素(即模式)。
In [32]: df
Out[32]:
Item Price Minimum
0 Coffee 1 1
1 Coffee 2 1
2 Coffee 2 1
3 Tea 3 3
4 Tea 4 3
5 Tea 4 3
In [33]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(pd.Series.mode)
In [34]: df
Out[34]:
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
Run Code Online (Sandbox Code Playgroud)
正如@Wen 所指出的,pd.Series.mode可以返回一个pd.Series值,所以只需获取第一个:
Out[67]:
Item Price Minimum
0 Coffee 1 1
1 Coffee 2 1
2 Coffee 2 1
3 Tea 3 3
4 Tea 4 3
5 Tea 4 3
6 Tea 3 3
In [68]: df[df.Item =='Tea'].Price.mode()
Out[68]:
0 3
1 4
dtype: int64
In [69]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(lambda S: S.mode()[0])
In [70]: df
Out[70]:
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 3
4 Tea 4 3 3
5 Tea 4 3 3
6 Tea 3 3 3
Run Code Online (Sandbox Code Playgroud)
你可以使用groupby+ transform+ value_counts+ idxmax-
df['Most_Common_Price'] = (
df.groupby('Item')['Price'].transform(lambda x: x.value_counts().idxmax()))
df
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
Run Code Online (Sandbox Code Playgroud)
改进(谢谢,Vaishali!)涉及使用pd.Series.map-
# Thanks, Vaishali!
df['Item'] = (df['Item'].map(df.groupby('Item')['Price']
.agg(lambda x: x.value_counts().idxmax()))
df
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
Run Code Online (Sandbox Code Playgroud)