无法理解sklearn的SVM的predict_proba函数

Question

无法理解sklearn的SVM的predict_proba函数

use*_*126 5 python classification machine-learning probability scikit-learn

我无法理解 sklearn 的函数，希望得到一些澄清。起初我以为sklearn的SVM的predict_proba函数给出了分类器预测的置信度，但是在使用我的情绪识别程序使用它之后，我开始产生怀疑，感觉我误解了predict_proba函数的用途和方式工作了。

例如，我的代码设置如下：

# Just finished training and now is splitting data (cross validation)
# and will give an accuracy after testing the accuracy of the test data

features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(main, target, test_size = 0.4)

model = SVC(probability=True)
model.fit(features_train, labels_train)
pred = model.predict(features_test)

accuracy = accuracy_score(labels_test, pred)
print accuracy

# Code that records video of 17 frames and forms matrix know as
# sub_main with features that would be fed into SVM

# Few lines of code later. . .  

model.predict(sub_main)
prob = model.predict_proba(sub_main)

prob_s = np.around(prob, decimals=5)
prob_s = prob_s* 100
pred = model.predict(sub_main)

print ''
print 'Prediction: '
print pred
print 'Probability: '
print 'Neutral: ', prob_s[0,0]
print 'Smiling: ', prob_s[0,1]
print 'Shocked: ', prob_s[0,2]
print 'Angry: ', prob_s[0,3]
print ''

Run Code Online (Sandbox Code Playgroud)

当我测试它时，它给了我这样的东西：

Prediction: 
['Neutral']
Probability: 
Neutral:  66.084
Smiling:  17.875
Shocked:  11.883
Angry:  4.157

Run Code Online (Sandbox Code Playgroud)

它有 66% 的置信度认为正确的分类是“中性”。66紧随“中性”之后，而“中性”恰好是最高的数字。最高的数字被标记为实际预测，我对此感到很高兴。

但最终还是最终。。。

Prediction: 
['Angry']
Probability: 
Neutral:  99.309
Smiling:  0.16
Shocked:  0.511
Angry:  0.02

Run Code Online (Sandbox Code Playgroud)

它做出了预测“愤怒”（顺便说一句，这是正确的分类），并在“中性”旁边指定了 99.3% 的置信度。尽管预测完全不同，但最高置信度（最高数字）被分配给中性。

有时它也这样做：

Prediction: 
['Smiling']
Probability: 
Neutral:  0.0
Smiling:  0.011
Shocked:  0.098
Angry:  99.891

Prediction: 
['Angry']
Probability: 
Neutral:  99.982
Smiling:  0.0
Shocked:  0.016
Angry:  0.001

Run Code Online (Sandbox Code Playgroud)

我不认为理解 SVM 的 Predict_proba 函数是如何工作的，并且希望澄清它是如何工作的以及我的代码发生了什么。我的代码发生了什么？

Answer 1

RPr*_*sle 1

我不太了解 SVC 的工作原理，因此您可以考虑评论中的内容来完成此答案。

您必须考虑到 predic_proba 将以字典顺序为您提供类别，就像它们出现在classes_属性中一样。你在文档中有这个。

当您想打印结果时，您必须考虑这一点。我们可以在您的示例中看到“愤怒”位于第一个索引，因此除了第一个索引之外，您的结果都很好。

尝试这个：

print 'Neutral: ', prob_s[0,1]
print 'Smiling: ', prob_s[0,3]
print 'Shocked: ', prob_s[0,2]
print 'Angry: ', prob_s[0,0]

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，4 月前
查看次数：	1465 次
最近记录：	10 年，4 月前