使用pandas groupby查找每个组中文本的平均长度

Question

使用pandas groupby查找每个组中文本的平均长度

mad*_*aks 2 python pandas pandas-groupby

我正在使用莎士比亚语料库.

    act literature_type scene   scene_text  scene_title speaker title
0   1   Comedy  1   In delivering my son from me, I bury a second ...   Rousillon. The COUNT's palace.  COUNTESS    All's Well That Ends Well
1   1   Comedy  1   And I in going, madam, weep o'er my father's d...   Rousillon. The COUNT's palace.  BERTRAM All's Well That Ends Well
2   1   Comedy  1   You shall find of the king a husband, madam; y...   Rousillon. The COUNT's palace.  LAFEU   All's Well That Ends Well
3   1   Comedy  1   What hope is there of his majesty's amendment?  Rousillon. The COUNT's palace.  COUNTESS    All's Well That Ends Well
4   1   Comedy  1   He hath abandoned his physicians, madam; under...   Rousillon. The COUNT's palace.  LAFEU   All's Well That Ends Well

Run Code Online (Sandbox Code Playgroud)

我想找到scene_text每个标题的平均长度.

我想用的东西是:

all_works_by_speaker_df.groupby('title').apply(lambda x: np.mean(len(x)))

Run Code Online (Sandbox Code Playgroud)

这只返回每个标题中的场景数.

Answer 1

jez*_*ael 5

如果需要len字符:

df = (all_works_by_speaker_df.groupby('title')['scene_text']
                            .apply(lambda x: np.mean(x.str.len()))
                            .reset_index(name='mean_len_text'))
print (df)

                       title  mean_len_text
0  All's Well That Ends Well           48.4

Run Code Online (Sandbox Code Playgroud)

如果需要len使用Vaishali的解决方案.

归档时间：	8 年，1 月前
查看次数：	2860 次
最近记录：	8 年，1 月前