Lau*_*ber 4 python string pandas
我想删除df列中不存在于已定义列表中的所有子字符串.例如:
mylist = {good, like, bad, hated, terrible, liked}
Current: Desired:
index content index content
0 a very good idea, I like it 0 good like
1 was the bad thing to do 1 bad
2 I hated it, it was terrible 2 hated terrible
... ...
k Why do you think she liked it k liked
Run Code Online (Sandbox Code Playgroud)
我已经设法定义了一个函数,它保存所有单词不在列表中,但是不知道如何反转这个函数来实现我想要的:
pat = r'\b(?:{})\b'.format('|'.join(mylist))
df['column1'] = df['column1'].str.contains(pat, '')
Run Code Online (Sandbox Code Playgroud)
任何帮助,将不胜感激.
df['column1'] = df['content'].str.findall('(' + pat + ')').str.join(' ')
print (df)
content column1
0 a very good idea, I like it good like
1 was the bad thing to do bad
2 I hated it, it was terrible hated terrible
3 Why do you think she liked it liked
Run Code Online (Sandbox Code Playgroud)
或者使用拆分,过滤和连接列表理解:
df['column1'] = df['content'].apply(lambda x: ' '.join([y for y in x.split() if y in mylist]))
print (df)
content column1
0 a very good idea, I like it good like
1 was the bad thing to do bad
2 I hated it, it was terrible hated terrible
3 Why do you think she liked it liked
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
85 次 |
| 最近记录: |