当我从“随机森林”得出的混淆矩阵显示该模型不能很好地预测疾病时，为什么我的ROC图和AUC值看起来不错？

Question

当我从“随机森林”得出的混淆矩阵显示该模型不能很好地预测疾病时，为什么我的ROC图和AUC值看起来不错？

Ali*_*cia 2 r machine-learning random-forest roc auc

我正在使用R中的软件包randomForest创建一个模型来将病例分类为疾病（1）或无疾病（0）：

classify_BV_100t <- randomForest(bv.disease~., data=RF_input_BV_clean, ntree = 100, localImp = TRUE)

print(classify_BV_100t)

Call:
 randomForest(formula = bv.disease ~ ., data = RF_input_BV_clean,      ntree = 100, localImp = TRUE) 
           Type of random forest: classification
                 Number of trees: 100
No. of variables tried at each split: 53

    OOB estimate of  error rate: 8.04%
Confusion matrix:
    0  1 class.error
0 510  7  0.01353965
1  39 16  0.70909091

Run Code Online (Sandbox Code Playgroud)

我的混淆矩阵显示该模型擅长分类0（无疾病），但非常糟糕，不能分类1（疾病）。

但是当我绘制ROC图时，它给人的印象是该模型相当不错。

这是我绘制ROC的2种不同方法：

（使用https://stats.stackexchange.com/questions/188616/how-can-we-calculate-roc-auc-for-classification-algorithm-such-as-random-forest）
```
library(pROC)
rf.roc<-roc(RF_input_BV_clean$bv.disease, classify_BV_100t$votes[,2])
plot(rf.roc)
auc(rf.roc)
```
Run Code Online (Sandbox Code Playgroud)

（在R中使用插入符号进行训练后，如何在ROC下使用ROC和AUC计算？）

library(ROCR)
predictions <- as.vector(classify_BV_100t$votes[,2])
pred <- prediction(predictions, RF_input_BV_clean$bv.disease)

perf_AUC <- performance(pred,"auc") #Calculate the AUC value
AUC <- perf_AUC@y.values[[1]]

perf_ROC <- performance(pred,"tpr","fpr") #plot the actual ROC curve
plot(perf_ROC, main="ROC plot")
text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))

Run Code Online (Sandbox Code Playgroud)

这些是1和2中的ROC图：

ROC图1

ROC图2

两种方法给我的AUC为0.8621593。

有谁知道为什么随机森林混淆矩阵的结果似乎不会与ROC / AUC相加？

Answer 1

小智 5

我不认为您的ROC图有任何问题，并且您对差异的评估是正确的。

高AUC值是真实负率很高的产物。ROC考虑了敏感性；在很大程度上衡量了真正的积极价值和特异性；真实负值的度量。由于您的特异性很高，因此该指标有效地承载了模型的较低灵敏度值，这使您的AUC保持相对较高。是的，它的AUC很高，但是正如您所提到的，该模型仅擅长预测负数。

我建议您计算其他指标（敏感性，特异性，真阳性率，假阳性率...），并在评估模型时评估所有这些指标的组合。AUC是一种质量指标，但它背后还有其他更多指标，这意味着更多。

归档时间：	6 年，6 月前
查看次数：	60 次
最近记录：	6 年，6 月前