我正在尝试使用Flask在Heroku上运行webapp.webapp使用NLTK(自然语言工具包库)在Python中编程.
其中一个文件有以下标题:
import nltk, json, operator
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
Run Code Online (Sandbox Code Playgroud)
当调用带有停用词代码的网页时,会产生以下错误:
LookupError:
**********************************************************************
Resource 'corpora/stopwords' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
Searched in:
- '/app/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
**********************************************************************
Run Code Online (Sandbox Code Playgroud)
使用的确切代码:
#remove punctuation
toker = RegexpTokenizer(r'((?<=[^\w\s])\w(?=[^\w\s])|(\W))+', gaps=True)
data = toker.tokenize(data)
#remove stop words and digits
stopword = stopwords.words('english')
data = [w for w in data if w not in stopword and not w.isdigit()]
Run Code Online (Sandbox Code Playgroud)
Heroku上的webapp在stopword …