NLTK和停用词失败#lookuperror

Question

NLTK和停用词失败#lookuperror

Fac*_*ndo 56 python nltk stop-words sentiment-analysis

我正在尝试启动一个情绪分析项目,我将使用停用词方法.我做了一些研究,我发现nltk有停用词,但是当我执行命令时出现错误.

我所做的是以下内容,以便了解nltk使用的单词(就像你在http://www.nltk.org/book/ch02.html第 4.1节中找到的那样):

from nltk.corpus import stopwords
stopwords.words('english')

Run Code Online (Sandbox Code Playgroud)

但当我按下回车时,我获得了

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
<ipython-input-6-ff9cd17f22b2> in <module>()
----> 1 stopwords.words('english')

C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __getattr__(self, attr)
 66
 67     def __getattr__(self, attr):
---> 68         self.__load()
 69         # This looks circular, but its not, since __load() changes our
 70         # __class__ to something new:

C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __load(self)
 54             except LookupError, e:
 55                 try: root = nltk.data.find('corpora/%s' % zip_name)
---> 56                 except LookupError: raise e
 57
 58         # Load the corpus.

LookupError:
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
- 'C:\\Users\\Meru/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\lib\\nltk_data'
- 'C:\\Users\\Meru\\AppData\\Roaming\\nltk_data'
**********************************************************************

Run Code Online (Sandbox Code Playgroud)

而且,由于这个问题,这样的事情无法正常运行(获得相同的错误):

>>> from nltk.corpus import stopwords
>>> stop = stopwords.words('english')
>>> sentence = "this is a foo bar sentence"
>>> print [i for i in sentence.split() if i not in stop]

Run Code Online (Sandbox Code Playgroud)

你知道可能有什么问题吗？我必须使用西班牙语,你还推荐另一种方法吗？我还想过使用Goslate包和英文数据集

谢谢阅读!

PD:我使用Ananconda

Answer 1

ttt*_*sss 133

您似乎没有计算机上的停用词语料库.

您需要启动NLTK Downloader并下载所需的所有数据.

打开Python控制台并执行以下操作:

>>> import nltk
>>> nltk.download()
showing info http://nltk.github.com/nltk_data/

Run Code Online (Sandbox Code Playgroud)

在打开的GUI窗口中,只需按下"下载"按钮即可下载所有语料库或转到"Corpora"选项卡,只下载您需要/想要的语料库.

或者,如果你想避免使用GUI并知道你想要下载什么:````nltk.download("stopwords")``` (74认同)

Answer 2

Abu*_*oeb 11

我尝试从ubuntu终端,我不知道为什么GUI没有显示根据tttthomasssss答案.所以我按照KLDavenport的评论进行了操作.以下是摘要:

打开你的终端/命令行然后键入python

>>> import nltk .>>> nltk.download("stopwords")

这将把停用词语料库存储在nltk_data下.对我来说是这样的/home/myusername/nltk_data/corpora/stopwords.

如果您需要其他语料库,请访问nltk数据并找到包含其ID的语料库.然后像我们为停用词一样使用ID下载.

Answer 3

Has*_*eeb 9

import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
STOPWORDS = set(stopwords.words('english'))

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，1 月前
查看次数：	76047 次
最近记录：	7 年，5 月前