我在 e1071 包中使用 SVM 进行二进制分类。我同时使用概率属性和 SVM 预测分类来比较结果。让我感到困惑的是,预测函数的预测分类(0 或 1)似乎与属性中列出的实际概率不一致。对于级别 1 的某些非常高的概率,SVM 分类为级别 0,而对于级别 1 的某些低概率,SVM 分类为级别 1。
这是示例代码和结果
svm_model <- svm(as.factor(CHURNED) ~ .
, scale = FALSE
, data = train
, cost = 1
, gamma = 0.1
, kernel = "radial"
, probability = TRUE
)
test$Pred_Class <- predict(svm_model, test, probability = TRUE)
test$Pred_Prob <- attr(test$Pred_Class, "probabilities")[,1]
Run Code Online (Sandbox Code Playgroud)
结果如下:(行的放置方式不同以查看各种示例)
CHURNED:是被预测的响应变量
Pred_class:是 SVM 预测的类
Pred_Prob:是预测概率,基于哪个SVM进行分类?
CHURNED Pred_Class Pred_Prob
1 0 0.03968526 # --> makes sense
1 0 0.03968526
1 0 0.07033222
1 0 0.11711195
1 0 0.12477983
1 0 0.12827296
1 0 0.12829345
1 0 0.12829345
1 0 0.12829345
1 0 0.12829444
1 0 0.12829927
1 0 0.12829927
1 0 0.12831169
1 0 0.12831169
1 0 0.12831428
1 1 0.13053475 # --> doesn't make sense. Prob is less than 0.5
1 1 0.13053475
1 1 0.13053475
1 1 0.1305348
1 1 0.1305348
1 1 0.1305348
1 1 0.1690807
1 1 0.2206993
1 1 0.2321171
0 0 0.998289 # --> doesn't make sense. Prob is almost 1!
0 0 0.9982887
0 0 0.993133
0 0 0.9898889
1 0 0.9849951
0 0 0.9849951
1 0 0.546427
0 0 0.5440994 # --> doesn't make sense. Prob is more than 0.5
0 0 0.5437889
1 0 0.5417848
0 0 0.5284112
0 0 0.5252177
0 1 0.5180776 # --> makes sense but is not consistent with above example
0 1 0.5180704
1 1 0.5180436
1 1 0.5180436
0 1 0.518043
Run Code Online (Sandbox Code Playgroud)
这个结果对我来说根本没有意义。预测的类别和预测的概率不匹配。我已经检查以确保我从“概率”属性矩阵中引用了正确的列:
test$Pred_Class
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[98] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
attr(,"probabilities")
1 0
6442 0.2369796 0.7630204
6443 0.2520246 0.7479754
6513 0.2322581 0.7677419
6801 0.2309437 0.7690563
6802 0.2244768 0.7755232
6954 0.2322450 0.7677550
6968 0.2537544 0.7462456
6989 0.2352477 0.7647523
7072 0.2322308 0.7677692
...
...
...
Run Code Online (Sandbox Code Playgroud)
也许我错误地解释了概率?