为什么ggplot 中的geom_roc 与plot.roc 的ROC 差异如此之大？

Question

为什么ggplot 中的geom_roc 与plot.roc 的ROC 差异如此之大？

我想我已经被派到这里接受培训了。

library(caret)
library(mlbench)
library(plotROC)
library(pROC)

data(Sonar)
ctrl <- trainControl(method="cv", 
                     summaryFunction=twoClassSummary, 
                     classProbs=T,
                     savePredictions = T)
rfFit <- train(Class ~ ., data=Sonar, 
               method="rf", preProc=c("center", "scale"), 
               trControl=ctrl)
    
# Select a parameter setting
selectedIndices <- rfFit$pred$mtry == 2

Run Code Online (Sandbox Code Playgroud)

我想绘制 ROC。

plot.roc(rfFit$pred$obs[selectedIndices],
         rfFit$pred$M[selectedIndices])

Run Code Online (Sandbox Code Playgroud)

然而，当我尝试 ggplot2 方法时，它给了我完全不同的东西。

g <- ggplot(rfFit$pred[selectedIndices, ], aes(m=M, d=factor(obs, levels = c("R", "M")))) + 
  geom_roc(n.cuts=0) + 
  coord_equal() +
  style_roc()

g + annotate("text", x=0.75, y=0.25, label=paste("AUC =", round((calc_auc(g))$AUC, 4)))

Run Code Online (Sandbox Code Playgroud)

我在这里做了一些非常错误的事情，但我不知道它是什么。谢谢。

Answer 1

All*_*ron 5

因子水平的顺序被忽略geom_roc。请注意，无论您以哪种方式分配您的levels = c('R', 'M')，您都会收到警告：

#> Warning message:
#> In verify_d(data$d) : D not labeled 0/1, assuming M = 0 and R = 1!

Run Code Online (Sandbox Code Playgroud)

这意味着您将获得“反预测”的 ROC（即与模型实际做出的预测相反）。因此它是实际 ROC 的镜像。

您需要将预测显式转换为 1 和 0 的数字列：

g <- ggplot(rfFit$pred[selectedIndices, ], 
       aes(m=M, d= as.numeric(factor(obs, levels = c("R", "M"))) - 1)) + 
  geom_roc(n.cuts=0) + 
  coord_equal() +
  style_roc()

g + annotate("text", x=0.75, y=0.25, 
           label=paste("AUC =", round((calc_auc(g))$AUC, 4)))

Run Code Online (Sandbox Code Playgroud)

归档时间：	3 年，9 月前
查看次数：	715 次
最近记录：	3 年，9 月前