i.n*_*n.m 1 split group-by python-3.x pandas
我有df这样的,
Owner Messages
AAA (YY) Duplicates
AAA Missing Number; (VV) Corrected Value; (YY) Duplicates
AAA (YY) Duplicates
BBB (YY) Duplicates
BBB Missing Measure; Missing Number
Run Code Online (Sandbox Code Playgroud)
当我做这样的正常时groupby,
df_grouped = df.groupby([' Owner', 'Messages']).size().reset_index(name='count')
df_grouped
Run Code Online (Sandbox Code Playgroud)
我按预期得到了这个,
Owner Messages count
0 AAA (YY) Duplicates 2
1 AAA Missing Number; (VV) Corrected Value; (YY) Duplicates 1
2 BBB (YY) Duplicates 1
3 BBB Missing Measure; Missing Number 1
Run Code Online (Sandbox Code Playgroud)
但是,我需要一些东西(所需的输出),就像这个;内部Messages列拆分一样.
Owner Messages count
0 AAA (YY) Duplicates 3
1 AAA Missing Number 1
2 AAA (VV) Corrected Value 1
3 BBB (YY) Duplicates 1
4 BBB Missing Measure 1
5 BBB Missing Number 1
Run Code Online (Sandbox Code Playgroud)
到目前为止,根据这篇文章,@ LeoRochael的回答,它将Messages列的值拆分;并放入列表中.无论如何,分裂后我无法得到个人数.
任何想法如何获得我想要的输出?
您需要取消原始数据框,然后我们只需要进行分组 size
s=df.set_index('Owner').Messages.str.split('; ',expand=True).stack().to_frame('Messages').reset_index()
s.groupby(['Owner','Messages']).size()
Out[1213]:
Owner Messages
AAA (VV) Corrected Value 1
(YY) Duplicates 3
Missing Number 1
BBB (YY) Duplicates 1
Missing Measure 1
Missing Number 1
dtype: int64
Run Code Online (Sandbox Code Playgroud)