拆分摘要数据并重新汇总

use*_*331 5 python pandas

我有一个摘要df,看起来像这样:

Apples             100
Bananas            34
Kumquats           54
Greengages         101
Apples;Kumquats    5
Bananas;Greengages 7
Run Code Online (Sandbox Code Playgroud)

我想通过将组合水果的数量分成单个项目来简化它:

Apples             105
Bananas            41
Kumquats           59
Greengages         108
Run Code Online (Sandbox Code Playgroud)

即我弄掉行一样Apples;Kumquats,但增加了两个ApplesKumquats通过5

在Pandas中,有什么好方法吗?

jez*_*ael 1

您可以 split 值 by ;、 reshape bystack和aggregate sum

print (df)
                    a    b
0              Apples  100
1             Bananas   34
2            Kumquats   54
3          Greengages  101
4     Apples;Kumquats    5
5  Bananas;Greengages    7

df1 = (df.set_index('b')['a']
         .str.split(';', expand=True)
         .stack()
         .reset_index(name='c')
         .groupby('c', as_index=False)['b'].sum())
print (df1)
            c    b
0      Apples  105
1     Bananas   41
2  Greengages  108
3    Kumquats   59
Run Code Online (Sandbox Code Playgroud)

或者解决方案defaultdict

from collections import defaultdict

d = defaultdict(int)
for a, b in zip(df['a'], df['b']):
    for x in a.split(';'):
        d[x] += b

df = pd.DataFrame({'a':list(d.keys()), 'b':list(d.values())})
print (df)
            a    b
0      Apples  105
1     Bananas   41
2    Kumquats   59
3  Greengages  108  
Run Code Online (Sandbox Code Playgroud)