Ste*_*zzi 5 python dataframe pandas pandas-groupby
我对pandas. 我需要汇总'Names'它们是否具有相同的名称,然后为'Rating'和'NumsHelpful'(不计算NaN)求平均值。'Review'应该被连接,而'Weight(Pounds)'应该保持不变:
col names: ['Brand', 'Name', 'NumsHelpful', 'Rating', 'Weight(Pounds)', 'Review']
Name 'Brand' 'Name'
1534 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1535 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1536 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1537 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1538 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1539 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1540 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
'NumsHelpful' 'Rating' 'Weight'
1534 NaN 2 4.5
1535 NaN 2 4.5
1536 NaN NaN 4.5
1537 NaN NaN 4.5
1538 2 NaN 4.5
1539 3 5 4.5
1540 5 NaN 4.5
'Review'
1534 Yummy - Delish
1535 The best Bloody Mary mix! - The best Bloody Ma...
1536 Best Taste by far - I've tried several if not ...
1537 Best bloody mary mix ever - This is also good ...
1538 Outstanding - Has a small kick to it but very ...
1539 OMG! So Good! - Spicy, terrific Bloody Mary mix!
1540 Good stuff - This is the best
Run Code Online (Sandbox Code Playgroud)
所以输出应该是这样的:
'Brand' 'Name' 'NumsHelpful' 'Rating'
Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz 3.33 3
'Weight' 'Review'
4.5 Review1 / Review2 / ... / ReviewN
Run Code Online (Sandbox Code Playgroud)
我该如何进行?谢谢。
使用DataFrameGroupBy.agg的列的字典和聚合函数-列Weight,并Brand通过agregated first-这意味着每个组第一值:
d = {'NumsHelpful':'mean',
'Review':'/'.join,
'Weight':'first',
'Brand':'first',
'Rating':'mean'}
df = df.groupby('Name').agg(d).reset_index()
print (df)
Name NumsHelpful \
0 Zing Zang Bloody Mary Mix, 32 fl oz 3.333333
Review Weight Brand \
0 Yummy - Delish/The best Bloody Mary mix! - The... 4.5 Zing Zang
Rating
0 3.0
Run Code Online (Sandbox Code Playgroud)
同样在熊猫 0.23.1 熊猫版本中获得:
FutureWarning: 'Name' 既是索引级别又是列标签。默认为列,但这会在未来版本中引发歧义错误
解决方案是删除索引名称Name:
df.index.name = None
Run Code Online (Sandbox Code Playgroud)
或者:
df = df.rename_axis(None)
Run Code Online (Sandbox Code Playgroud)
另一种可能的解决方案不是由 聚合first,而是将这些列添加到groupby:
d = {'NumsHelpful':'mean', 'Review':'/'.join, 'Rating':'mean'}
df = df.groupby(['Name', 'Weight','Brand']).agg(d).reset_index()
Run Code Online (Sandbox Code Playgroud)
如果每组有相同的值,两种解决方案都会返回相同的输出。
编辑:
如果需要将字符串(对象)列转换为数字,请先尝试通过astype以下方式转换:
df['Weight(Pounds)'] = df['Weight(Pounds)'].astype(float)
Run Code Online (Sandbox Code Playgroud)
如果它使用to_numeric参数errors='coerce'将不可解析的字符串转换为NaNs失败:
df['Weight(Pounds)'] = pd.to_numeric(df['Weight(Pounds)'], errors='coerce')
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
15946 次 |
| 最近记录: |