从数据帧python中的行和列(单元格)中删除重复项

PAs*_*loE 3 python dataframe pandas

我有两列在数据框中每个单元格有很多重复项目.与此类似的东西:

Index   x    y  
  1     1    ec, us, us, gbr, lst
  2     5    ec, us, us, us, us, ec, ec, ec, ec
  3     8    ec, us, us, gbr, lst, lst, lst, lst, gbr
  4     5    ec, ec, ec, us, us, ir, us, ec, ir, ec, ec
  5     7    chn, chn, chn, ec, ec, us, us, gbr, lst
Run Code Online (Sandbox Code Playgroud)

我需要消除所有重复的项目,得到如下结果数据帧:

Index   x    y  
  1     1    ec, us, gbr, lst
  2     5    ec, us
  3     8    ec, us, gbr,lst
  4     5    ec, us, ir
  5     7    chn, ec, us, gbr, lst
Run Code Online (Sandbox Code Playgroud)

谢谢!!

Flo*_*oor 8

Split申请setjoin

df['y'].str.split(', ').apply(set).str.join(', ')

0         us, ec, gbr, lst
1                   us, ec
2         us, ec, gbr, lst
3               us, ec, ir
4    us, lst, ec, gbr, chn
Name: y, dtype: object
Run Code Online (Sandbox Code Playgroud)

根据评论更新:

df['y'].str.replace('nan|[{}\s]','').str.split(',').apply(set).str.join(',').str.strip(',').str.replace(",{2,}",",")

# Replace all the braces and nan with `''`, then split and apply set and join
Run Code Online (Sandbox Code Playgroud)