从文件中删除停用词

Question

从文件中删除停用词

我想从我的文件中的数据列中删除停用词.我过滤了最终用户说话时的界限.但它并没有过滤掉usertext.apply(lambda x: [word for word in x if word not in stop_words]) 我做错了什么的停顿词？

import pandas as pd
from stop_words  import get_stop_words
df = pd.read_csv("F:/textclustering/data/cleandata.csv", encoding="iso-8859-1")
usertext = df[df.Role.str.contains("End-user",na=False)][['Data','chatid']]
stop_words = get_stop_words('dutch')
clean = usertext.apply(lambda x: [word for word in x if word not in stop_words])
print(clean)

Run Code Online (Sandbox Code Playgroud)

Answer 1

gal*_*yan 0

clean = usertext.apply(lambda x:  x if x not in stop_words else '')

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年前
查看次数：	709 次
最近记录：	9 年前