我正在尝试按"value_1"列中的值进行分组.但我的最后一栏是由列表组成的.当我尝试使用"value_1"列进行分组时,由列表组成的列将消失.
数据帧:
value_1: value_2: value_3: list:
american california, nyc walmart, kmart [supermarket, connivence]
canadian toronto dunkinDonuts [coffee]
american texas [state]
canadian walmart [supermarket]
... ... ... ....
Run Code Online (Sandbox Code Playgroud)
我的预期输出是:
value_1: value_2: value_3: list:
american california, nyc, texas walmart, kmart [supermarket, connivence, state]
canadian toronto dunkinDonuts, walmart [coffee, supermarket]
Run Code Online (Sandbox Code Playgroud)
谢谢!
您可以使用以下函数聚合包含字符串的列:groupby
value_1
def str_cat(x):
return x.str.cat(sep=', ')
Run Code Online (Sandbox Code Playgroud)
并用于GroupBy.sum
附加列中的列表list
:
df.replace('',None).groupby('value_1').agg({'list':'sum', 'value_2': str_cat,
'value_3': str_cat})
list value_2 \
value_1
american [supermarket, connivence, state] california, nyc, texas
canadian [coffee, sipermarket] toronto, texas
value_3
value_1
american walmart, kmart, dunkinDonuts
canadian dunkinDonuts, walmart
Run Code Online (Sandbox Code Playgroud)
使用no list
和value_1
for list
使用lambda函数创建动态字典,并使用带有flatenning的列表理解:
f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2
df = df.groupby('value_1', as_index=False).agg(d)
print (df)
value_1 value_2 value_3 \
0 american california, nyc, texas walmart, kmart
1 canadian toronto dunkinDonuts, walmart
list
0 [supermarket, connivence, state]
1 [coffee, supermarket]
Run Code Online (Sandbox Code Playgroud)
说明:
f1
并且f2
是lambda函数.
首先删除缺失值(如果存在)和join
带分隔符的字符串:
f1 = lambda x: ', '.join(x.dropna())
Run Code Online (Sandbox Code Playgroud)
首先只获取字符串值(省略缺失值,因为NaN
s)和join
带分隔符的字符串:
f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
Run Code Online (Sandbox Code Playgroud)
首先获取所有字符串值,过滤空字符串和join
带分隔符的字符串:
f1 = lambda x: ', '.join([y for y in x if y != ''])
Run Code Online (Sandbox Code Playgroud)
函数f2
用于展平列表,因为聚合后得到嵌套列表[['a','b'], ['c']]
f2 = lambda x: [z for y in x for z in y]
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
893 次 |
最近记录: |