我有以下代码
import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile
lmtzr = nltk.stem.wordnet.WordNetLemmatizer()
def sanitize(wordList):
answer = [word.translate(None, string.punctuation) for word in wordList]
answer = [lmtzr.lemmatize(word.lower()) for word in answer]
return answer
words = []
for filename in json_list:
words.extend([sanitize(nltk.word_tokenize(' '.join([tweet['text']
for tweet in json.load(open(filename,READ))])))])
Run Code Online (Sandbox Code Playgroud)
我写的时候,我在一个单独的testing.py文件中测试过2-4行
import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile
wordList= ['\'the', 'the', '"the']
print wordList
wordList2 = [word.translate(None, string.punctuation) for word in wordList]
print wordList2
answer = [lmtzr.lemmatize(word.lower()) for word …Run Code Online (Sandbox Code Playgroud)