Lim*_*nut 0 python nltk punctuation
我无法弄清楚为什么这不起作用:
import nltk
from nltk.corpus import stopwords
import string
with open('moby.txt', 'r') as f:
moby_raw = f.read()
stop = set(stopwords.words('english'))
moby_tokens = nltk.word_tokenize(moby_raw)
text_no_stop_words_punct = [t for t in moby_tokens if t not in stop or t not in string.punctuation]
print(text_no_stop_words_punct)
Run Code Online (Sandbox Code Playgroud)
看着输出我有这个:
[...';', 'surging', 'from', 'side', 'to', 'side', ';', 'spasmodically', 'dilating', 'and', 'contracting',...]
Run Code Online (Sandbox Code Playgroud)
似乎标点符号仍在那里.我做错了什么?
它必须是and,而不是or:
if t not in stop and t not in string.punctuation
Run Code Online (Sandbox Code Playgroud)
要么:
if not (t in stop or t in string.punctuation):
Run Code Online (Sandbox Code Playgroud)
要么:
all_stops = stop | set(string.punctuation)
if t not in all_stops:
Run Code Online (Sandbox Code Playgroud)
后一种解决方案是最快的.
| 归档时间: |
|
| 查看次数: |
5536 次 |
| 最近记录: |