如何为 R 中具有三个类的 randomForest 模型绘制 ROC 曲线?

Ada*_*ice 2 plot r graph roc proc-r-package

我正在使用 R 包 randomForest 创建一个可分为三组的模型。

 model = randomForest(formula = condition ~ ., data = train, ntree = 2000,      
                       mtry = bestm, importance = TRUE, proximity = TRUE) 

           Type of random forest: classification
                 Number of trees: 2000
                 No. of variables tried at each split: 3

           OOB estimate of  error rate: 5.71%

           Confusion matrix:
           lethal mock resistant class.error
 lethal        20    1         0  0.04761905
 mock           1   37         0  0.02631579
 resistant      2    0         9  0.18181818
Run Code Online (Sandbox Code Playgroud)

我试过几个图书馆。例如,使用 ROCR,你不能做三个分类,只能做两个。看:

pred=prediction(predictions,train$condition)

Error in prediction(predictions, train$condition) : 
  Number of classes is not equal to 2.
  ROCR currently supports only evaluation of binary classification 
  tasks.
Run Code Online (Sandbox Code Playgroud)

来自模型 $votes 的数据如下所示:

         lethal        mock   resistant
 3   0.04514364 0.952120383 0.002735978
 89  0.32394366 0.147887324 0.528169014
 16  0.02564103 0.973009447 0.001349528
 110 0.55614973 0.433155080 0.010695187
 59  0.06685633 0.903271693 0.029871977
 43  0.13424658 0.865753425 0.000000000
 41  0.82987552 0.033195021 0.136929461
 86  0.32705249 0.468371467 0.204576043
 87  0.37704918 0.341530055 0.281420765
 ........
Run Code Online (Sandbox Code Playgroud)

我可以使用 pROC 包以这种方式获得一些非常丑陋的 ROC 图:

predictions <- as.numeric(predict(model, test, type = 'response'))
roc.multi <- multiclass.roc(test$condition, predictions, 
                            percent=TRUE)
rs <- roc.multi[['rocs']]
plot.roc(rs[[2]])
sapply(2:length(rs),function(i) lines.roc(rs[[i]],col=i))
Run Code Online (Sandbox Code Playgroud)

这些图如下所示: 图 1:丑陋的 ROC 曲线

但是没有办法平滑这些线,因为它们不是曲线,因为它们每条大约有 4 个点。

我需要一种方法来为这个模型绘制一条漂亮的平滑 ROC 曲线,但我似乎找不到。有谁知道一个好的方法?首先十分感谢!

Dam*_*ini 5

我在这里看到两个问题1) ROC 曲线适用于二元分类器,因此您应该将性能评估转换为一系列二元问题。我在下面展示了如何做到这一点。2)当您预测测试集时,您应该获得每个观察值属于您的每个类(而不仅仅是预测类)的概率。这将允许您绘制漂亮的 ROC 曲线。这是代码

#load libraries
library(randomForest)
library(pROC)

# generate some random data
set.seed(1111)
train <- data.frame(condition = sample(c("mock", "lethal", "resist"), replace = T, size = 1000))
train$feat01 <- sapply(train$condition, (function(i){ if (i == "mock") { rnorm(n = 1, mean = 0)} else if (i == "lethal") { rnorm(n = 1, mean = 1.5)} else { rnorm(n = 1, mean = -1.5)} }))
train$feat02 <- sapply(train$condition, (function(i){ if (i == "mock") { rnorm(n = 1, mean = 0)} else if (i == "lethal") { rnorm(n = 1, mean = 1.5)} else { rnorm(n = 1, mean = -1.5)} }))
train$feat03 <- sapply(train$condition, (function(i){ if (i == "mock") { rnorm(n = 1, mean = 0)} else if (i == "lethal") { rnorm(n = 1, mean = 1.5)} else { rnorm(n = 1, mean = -1.5)} }))
head(train)

test <- data.frame(condition = sample(c("mock", "lethal", "resist"), replace = T, size = 1000))
test$feat01 <- sapply(test$condition, (function(i){ if (i == "mock") { rnorm(n = 1, mean = 0)} else if (i == "lethal") { rnorm(n = 1, mean = 1.5)} else { rnorm(n = 1, mean = -1.5)} }))
test$feat02 <- sapply(test$condition, (function(i){ if (i == "mock") { rnorm(n = 1, mean = 0)} else if (i == "lethal") { rnorm(n = 1, mean = 1.5)} else { rnorm(n = 1, mean = -1.5)} }))
test$feat03 <- sapply(test$condition, (function(i){ if (i == "mock") { rnorm(n = 1, mean = 0)} else if (i == "lethal") { rnorm(n = 1, mean = 1.5)} else { rnorm(n = 1, mean = -1.5)} }))
head(test)
Run Code Online (Sandbox Code Playgroud)

现在我们有了一些数据,让我们像你一样训练一个随机森林模型

# model
model <- randomForest(formula = condition ~ ., data = train, ntree = 10, maxnodes= 100, norm.votes = F) 
Run Code Online (Sandbox Code Playgroud)

接下来,该模型用于预测测试数据。但是,您应该在type="prob"这里询问。

# predict test set, get probs instead of response
predictions <- as.data.frame(predict(model, test, type = "prob"))
Run Code Online (Sandbox Code Playgroud)

由于您有概率,请使用它们来获得最可能的类。

# predict class and then attach test class
predictions$predict <- names(predictions)[1:3][apply(predictions[,1:3], 1, which.max)]
predictions$observed <- test$condition
head(predictions)
  lethal mock resist predict observed
1    0.0  0.0    1.0  resist   resist
2    0.0  0.6    0.4    mock     mock
3    1.0  0.0    0.0  lethal     mock
4    0.0  0.0    1.0  resist   resist
5    0.0  1.0    0.0    mock     mock
6    0.7  0.3    0.0  lethal     mock
Run Code Online (Sandbox Code Playgroud)

现在,让我们看看如何绘制 ROC 曲线。对于每个类,将多类问题转换为二元问题。此外,调用roc()指定 2 个参数的函数:i)观察类和ii)类概率(而不是预测类)。

# 1 ROC curve, mock vs non mock
roc.mock <- roc(ifelse(predictions$observed=="mock", "mock", "non-mock"), as.numeric(predictions$mock))
plot(roc.mock, col = "gray60")

# others
roc.lethal <- roc(ifelse(predictions$observed=="lethal", "lethal", "non-lethal"), as.numeric(predictions$mock))
roc.resist <- roc(ifelse(predictions$observed=="resist", "resist", "non-resist"), as.numeric(predictions$mock))
lines(roc.lethal, col = "blue")
lines(roc.resist, col = "red")
Run Code Online (Sandbox Code Playgroud)

完毕。这是结果。当然,测试集中的观察越多,曲线就越平滑。

在此处输入图片说明