Python - 删除列表中没有的所有子字符串

Question

Python - 删除列表中没有的所有子字符串

我想删除df列中不存在于已定义列表中的所有子字符串.例如:

mylist = {good, like, bad, hated, terrible, liked}

Current:                                         Desired:
index      content                               index        content                                          
0          a very good idea, I like it           0            good like
1          was the bad thing to do               1            bad
2          I hated it, it was terrible           2            hated terrible
...                                              ...
k          Why do you think she liked it         k            liked

Run Code Online (Sandbox Code Playgroud)

我已经设法定义了一个函数,它保存所有单词不在列表中,但是不知道如何反转这个函数来实现我想要的:

pat = r'\b(?:{})\b'.format('|'.join(mylist))
df['column1'] = df['column1'].str.contains(pat, '')

Run Code Online (Sandbox Code Playgroud)

任何帮助,将不胜感激.

Answer 1

jez*_*ael 5

使用str.findall有str.join:

df['column1'] = df['content'].str.findall('(' + pat + ')').str.join(' ')
print (df)
                         content         column1
0    a very good idea, I like it       good like
1        was the bad thing to do             bad
2    I hated it, it was terrible  hated terrible
3  Why do you think she liked it           liked

Run Code Online (Sandbox Code Playgroud)

或者使用拆分,过滤和连接列表理解:

df['column1'] = df['content'].apply(lambda x: ' '.join([y for y in x.split() if y in mylist]))
print (df)
                         content         column1
0    a very good idea, I like it       good like
1        was the bad thing to do             bad
2    I hated it, it was terrible  hated terrible
3  Why do you think she liked it           liked

Run Code Online (Sandbox Code Playgroud)

你,我的朋友,是个绅士.谢谢您的帮助. (3认同)

归档时间：	7 年，8 月前
查看次数：	85 次
最近记录：	7 年，8 月前