我写了一个文本分类程序。当我运行该程序时,它会崩溃并显示错误,如下面的屏幕截图所示:
ValueError:当 n_samples=0、test_size=0.2 和 train_size=None 时,生成的训练集将为空。调整上述任何参数。
这是我的代码:
from sklearn.model_selection import train_test_split
from gensim.models.word2vec import Word2Vec
from sklearn.preprocessing import scale
from sklearn.linear_model import SGDClassifier
import nltk, string, json
import numpy as np
def cleanText(corpus):
reviews = []
for dd in corpus:
#for d in dd:
try:
words = nltk.word_tokenize(dd['description'])
words = [w.lower() for w in words]
reviews.append(words)
#break
except:
pass
return reviews
with open('C:\\NLP\\bad.json') as fin:
text = json.load(fin)
neg_rev = cleanText(text)
with open('C:\\NLP\\good.json') as fin:
text = json.load(fin)
pos_rev = …Run Code Online (Sandbox Code Playgroud)