相关疑难解决方法(0)

使用带有scikit-learn的TfidfVectorizer的NLTK停用词时的Unicode警告

我正在尝试使用来自sckit-learn的Tf-idf Vectorizer,使用来自NLTK的西班牙语停用词:

from nltk.corpus import stopwords

vectorizer = TfidfVectorizer(stop_words=stopwords.words("spanish"))
Run Code Online (Sandbox Code Playgroud)

问题是我得到以下警告:

/home/---/.virtualenvs/thesis/local/lib/python2.7/site-packages/sklearn/feature_extraction/text.py:122: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
tokens = [w for w in tokens if w not in stop_words]
Run Code Online (Sandbox Code Playgroud)

有没有简单的方法来解决这个问题?

python unicode nltk python-2.7 scikit-learn

3
推荐指数
1
解决办法
2637
查看次数

标签 统计

nltk ×1

python ×1

python-2.7 ×1

scikit-learn ×1

unicode ×1