Scikit学习SVC预测概率不能按预期工作

Kev*_*len 10 python svc scikit-learn

我使用SVM分类器构建了情绪分析器.我训练模型的概率=真,它可以给我概率.但是当我腌制我的模型并稍后再加载它时,概率不再起作用了.

该模型:

from sklearn.svm import SVC, LinearSVC
pipeline_svm = Pipeline([
    ('bow', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('classifier', SVC(probability=True)),])

# pipeline parameters to automatically explore and tune
param_svm = [
  {'classifier__C': [1, 10, 100, 1000], 'classifier__kernel': ['linear']},
  {'classifier__C': [1, 10, 100, 1000], 'classifier__gamma': [0.001, 0.0001], 'classifier__kernel': ['rbf']},
]

grid_svm = GridSearchCV(
    pipeline_svm,
    param_grid=param_svm,
    refit=True,
    n_jobs=-1, 
    scoring='accuracy',
    cv=StratifiedKFold(label_train, n_folds=5),)

svm_detector_reloaded = cPickle.load(open('svm_sentiment_analyzer.pkl', 'rb'))
print(svm_detector_reloaded.predict([""""Today is awesome day"""])[0])
Run Code Online (Sandbox Code Playgroud)

给我:

AttributeError:当probability = False时,predict_proba不可用

小智 8

按照上面的建议在初始化分类器时添加 (probability=True) 解决了我的错误:

clf = SVC(kernel='rbf', C=1e9, gamma=1e-07, probability=True).fit(xtrain,ytrain)
Run Code Online (Sandbox Code Playgroud)


小智 6

用: SVM(probability=True)

或者

grid_svm = GridSearchCV(
    probability=True
    pipeline_svm,
    param_grid=param_svm,
    refit=True,
    n_jobs=-1, 
    scoring='accuracy',
    cv=StratifiedKFold(label_train, n_folds=5),)
Run Code Online (Sandbox Code Playgroud)


Dav*_*cco 1

如果这有帮助,请使用以下命令对模型进行酸洗:

import pickle
pickle.dump(grid_svm, open('svm_sentiment_analyzer.pkl', 'wb'))
Run Code Online (Sandbox Code Playgroud)

并加载模型并预测

svm_detector_reloaded = pickle.load(open('svm_sentiment_analyzer.pkl', 'rb'))
print(svm_detector_reloaded.predict_proba(["Today is an awesome day"])[0])
Run Code Online (Sandbox Code Playgroud)

sents在处理代码以重新运行它并在 pandas DataFrame 上训练模型后,给我返回了两个概率

grid_svm.fit(sents.Sentence.values, sents.Positive.values)
Run Code Online (Sandbox Code Playgroud)

模型序列化的最佳实践(例如使用)可以在https://scikit-learn.org/stable/modules/model_persistence.htmljoblib找到