小编Saq*_*lam的帖子

word_tokenize TypeError:期望的字符串或缓冲区

在调用时word_tokenize我收到以下错误:

File "C:\Python34\lib\site-packages\nltk\tokenize\punkt.py", line 1322,
    in _slices_from_text for match in
    self._lang_vars.period_context_re().finditer(text):
TypeError: expected string or buffer
Run Code Online (Sandbox Code Playgroud)

我有一个大文本文件(1500.txt),我想从中删除停用词.我的代码如下:

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

with open('E:\\Book\\1500.txt', "r", encoding='ISO-8859-1') as File_1500:
    stop_words = set(stopwords.words("english"))
    words = word_tokenize(File_1500)
    filtered_sentence = [w for w in words if not w in stop_words]
    print(filtered_sentence)
Run Code Online (Sandbox Code Playgroud)

python nlp tokenize nltk python-3.x

1
推荐指数
1
解决办法
6956
查看次数

标签 统计

nlp ×1

nltk ×1

python ×1

python-3.x ×1

tokenize ×1