我在一组简短的文档上训练了分类器,并在为二进制分类任务获得了合理的f1和准确性得分后对其进行了腌制。
在培训期间,我使用sciki-learn countVectorizercv 减少了功能数量:
cv = CountVectorizer(min_df=1, ngram_range=(1, 3), max_features = 15000)
Run Code Online (Sandbox Code Playgroud)
然后使用fit_transform()和transform()方法获得转换后的训练集和测试集:
transformed_feat_train = numpy.zeros((0,0,))
transformed_feat_test = numpy.zeros((0,0,))
transformed_feat_train = cv.fit_transform(trainingTextFeat).toarray()
transformed_feat_test = cv.transform(testingTextFeat).toarray()
Run Code Online (Sandbox Code Playgroud)
所有这些对于训练和测试分类器都工作良好。但是,我不知道如何使用fit_transform(),并transform()用训练过的分类为预测未,未标记的数据的标签腌版本。
我完全按照训练/测试分类器的相同方式提取未标记数据上的特征:
## load the pickled classifier for labeling
pickledClassifier = joblib.load(pickledClassifierFile)
## transform data
cv = CountVectorizer(min_df=1, ngram_range=(1, 3), max_features = 15000)
cv.fit_transform(NOT_SURE)
transformed_Feat_unlabeled = numpy.zeros((0,0,))
transformed_Feat_unlabeled = cv.transform(unlabeled_text_feat).toarray()
## predict label on unseen, unlabeled data
l_predLabel = pickledClassifier.predict(transformed_feat_unlabeled)
Run Code Online (Sandbox Code Playgroud)
错误信息:
Traceback (most recent call last): …Run Code Online (Sandbox Code Playgroud)