paw*_*lty 6 python python-3.x pandas
我在熊猫中有以下DF:
+---------+--------+--------------------+
| keyword | weight | other keywords |
+---------+--------+--------------------+
| dog | 0.12 | [cat, horse, pig] |
| cat | 0.5 | [dog, pig, camel] |
| horse | 0.07 | [dog, camel, cat] |
| dog | 0.1 | [cat, horse] |
| dog | 0.2 | [cat, horse , pig] |
| horse | 0.3 | [camel] |
+---------+--------+--------------------+
Run Code Online (Sandbox Code Playgroud)
我想要执行的任务是按关键字进行分组,同时计算关键字频率,按权重平均并按其他关键字求和.结果将是这样的:
+---------+-----------+------------+------------------------------------------------+
| keyword | frequency | avg weight | sum other keywords |
+---------+-----------+------------+------------------------------------------------+
| dog | 3 | 0.14 | [cat, horse, pig, cat, horse, cat, horse, pig] |
| cat | 1 | 0.5 | [dog, pig, camel] |
| horse | 2 | 0.185 | [dog, camel, cat, camel] |
+---------+-----------+------------+------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
现在,我知道如何在许多单独的操作中执行它:value_counts,groupby.sum(),groupby.avg()然后合并它.然而,这是非常低效的,我必须做很多手动调整.
我想知道是否可以在一次操作中完成它?
jez*_*ael 10
你可以使用agg:
df = df.groupby('keyword').agg({'keyword':'size', 'weight':'mean', 'other keywords':'sum'})
#set new ordering of columns
df = df.reindex_axis(['keyword','weight','other keywords'], axis=1)
#reset index
df = df.rename_axis(None).reset_index()
#set new column names
df.columns = ['keyword','frequency','avg weight','sum other keywords']
print (df)
keyword frequency avg weight \
0 cat 1 0.500
1 dog 3 0.140
2 horse 2 0.185
sum other keywords
0 [dog, pig, camel]
1 [cat, horse, pig, cat, horse, cat, horse, pig]
2 [dog, camel, cat, camel]
Run Code Online (Sandbox Code Playgroud)