通过pandas按特殊字符和组拆分列的值

i.n*_*n.m 1 split group-by python-3.x pandas

我有df这样的,

Owner   Messages
AAA     (YY) Duplicates
AAA     Missing Number; (VV) Corrected Value; (YY) Duplicates
AAA     (YY) Duplicates
BBB     (YY) Duplicates
BBB     Missing Measure; Missing Number
Run Code Online (Sandbox Code Playgroud)

当我做这样的正常时groupby,

df_grouped = df.groupby([' Owner', 'Messages']).size().reset_index(name='count')
df_grouped
Run Code Online (Sandbox Code Playgroud)

我按预期得到了这个,

    Owner  Messages                                               count
0   AAA   (YY) Duplicates                                           2
1   AAA   Missing Number; (VV) Corrected Value; (YY) Duplicates     1
2   BBB   (YY) Duplicates                                           1
3   BBB   Missing Measure; Missing Number                           1
Run Code Online (Sandbox Code Playgroud)

但是,我需要一些东西(所需的输出),就像这个;内部Messages列拆分一样.

   Owner    Messages             count
0   AAA    (YY) Duplicates       3
1   AAA    Missing Number        1
2   AAA    (VV) Corrected Value  1
3   BBB    (YY) Duplicates       1
4   BBB    Missing Measure       1
5   BBB    Missing Number        1
Run Code Online (Sandbox Code Playgroud)

到目前为止,根据这篇文章,@ LeoRochael的回答,它将Messages列的值拆分;并放入列表中.无论如何,分裂后我无法得到个人数.

任何想法如何获得我想要的输出?

WeN*_*Ben 6

您需要取消原始数据框,然后我们只需要进行分组 size

s=df.set_index('Owner').Messages.str.split('; ',expand=True).stack().to_frame('Messages').reset_index()
s.groupby(['Owner','Messages']).size()
Out[1213]: 
Owner  Messages            
AAA    (VV) Corrected Value    1
       (YY) Duplicates         3
       Missing Number          1
BBB    (YY) Duplicates         1
       Missing Measure         1
       Missing Number          1
dtype: int64
Run Code Online (Sandbox Code Playgroud)