从停用词中清除列表

Question

从停用词中清除列表

这个变量：

sent=[('include', 'details', 'about', 'your performance'),
('show', 'the', 'results,', 'which', 'you\'ve', 'got')]

Run Code Online (Sandbox Code Playgroud)

需要清除停用词。我试过

output = [w for w in sent if not w in stop_words]

Run Code Online (Sandbox Code Playgroud)

但它没有奏效。怎么了？

Answer 1

Sy *_*Ker 8

from nltk.corpus import stopwords

stop_words = {w.lower() for w in stopwords.words('english')}

sent = [('include', 'details', 'about', 'your', 'performance'),
        ('show', 'the', 'results,', 'which', 'you\'ve', 'got')]

Run Code Online (Sandbox Code Playgroud)

如果您想创建一个没有停用词的单词列表；

>>> no_stop_words = [word for sentence in sent for word in sentence if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']

Run Code Online (Sandbox Code Playgroud)

如果你想保持句子完整；

>>> sent_no_stop = [[word for word in sentence if word not in stop_words] for sentence in sent]
[['include', 'details', 'performance'], ['show', 'results,', 'got']]

Run Code Online (Sandbox Code Playgroud)

但是，大多数时候您会使用单词列表（不带括号）；

sent = ['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']

>>> no_stopwords = [word for word in sent if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']

Run Code Online (Sandbox Code Playgroud)

请注意，对于任何重要的大小，“stop_words”应该是“set”而不是“list”。使用 stop_words = {w.lower() for w in stopwords.words('english')}`` 来实现这一点。 (3认同)
请注意，集合理解 ``{... for ... in ...}`` } 将始终创建一个集合，即使可迭代为空。只有字典理解 ``{...: ... for ... in ...}`` 才会创建一个字典。 (2认同)

Answer 2

Ann*_*Zen 6

是圆括号妨碍了迭代。如果您可以删除它们：

sent=['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']
output = [w for w in sent if not w in stopwords]

Run Code Online (Sandbox Code Playgroud)

如果没有，那么你可以这样做：

sent=[('include', 'details', 'about', 'your performance'),('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
output = [i for s in [[w for w in l if w not in stopwords] for l in sent] for i in s]

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，7 月前
查看次数：	264 次
最近记录：	5 年，7 月前