我正在尝试在一些训练和测试数据上训练svm模型.如果我将测试和训练数据结合起来,程序运行良好,但是如果我将它们分开并测试它所说的模型精度
Traceback (most recent call last):
File "/home/PycharmProjects/analysis.py", line 160, in <module>
main()
File "/home/PycharmProjects/analysis.py", line 156, in main
learn_model(tf_idf_train,target,tf_idf_test)
File "/home/PycharmProjects/analysis.py", line 113, in learn_model
predicted = classifier.predict(data_test)
File "/home/.local/lib/python3.4/site-packages/sklearn/svm/base.py", line 573, in predict
y = super(BaseSVC, self).predict(X)
File "/home/.local/lib/python3.4/site-packages/sklearn/svm/base.py", line 310, in predict
X = self._validate_for_predict(X)
File "/home/.local/lib/python3.4/site-packages/sklearn/svm/base.py", line 479, in _validate_for_predict
(n_features, self.shape_fit_[1]))
ValueError: X.shape[1] = 19137 should be equal to 4888, the number of features at training time
Run Code Online (Sandbox Code Playgroud)
这里的测试集大于列车集.因此测试集自然具有比trainset更多的特征.因此它给出值错误.
这是我的代码:
def load_train_file():
with open('~1k comments.csv',encoding='ISO-8859-1',) as …
Run Code Online (Sandbox Code Playgroud)