我有以下代码
import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile
lmtzr = nltk.stem.wordnet.WordNetLemmatizer()
def sanitize(wordList):
answer = [word.translate(None, string.punctuation) for word in wordList]
answer = [lmtzr.lemmatize(word.lower()) for word in answer]
return answer
words = []
for filename in json_list:
words.extend([sanitize(nltk.word_tokenize(' '.join([tweet['text']
for tweet in json.load(open(filename,READ))])))])
Run Code Online (Sandbox Code Playgroud)
我写的时候,我在一个单独的testing.py文件中测试过2-4行
import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile
wordList= ['\'the', 'the', '"the']
print wordList
wordList2 = [word.translate(None, string.punctuation) for word in wordList]
print wordList2
answer = [lmtzr.lemmatize(word.lower()) for word …Run Code Online (Sandbox Code Playgroud) 我被赋予了从文本文件或字符串中删除所有非数字字符(包括空格)的任务,然后在旧字符旁边打印新结果,例如:
之前:
sd67637 8
Run Code Online (Sandbox Code Playgroud)
后:
sd67637 8 = 676378
Run Code Online (Sandbox Code Playgroud)
由于我是初学者,我不知道从哪里开始这项任务.请帮忙