我正在尝试使用rpart在插入符号中的模型选择中最大化灵敏度.为此,我试图复制这里给出的方法(向下滚动到带有用户定义函数FourStat的示例)插入符号的github页面
# create own function so we can use "sensitivity" as our metric to maximise:
Sensitivity.fc <- function (data, lev = levels(data$obs), model = NULL) {
out <- c(twoClassSummary(data, lev = levels(data$obs), model = NULL))
c(out, Sensitivity = out["Sens"])
}
rpart_caret_fit <- train(outcome~pred1+pred2+pred3+pred4,
na.action = na.pass,
method = "rpart",
control=rpart.control(maxdepth = 6),
tuneLength = 20,
# maximise sensitivity
metric = "Sensitivity",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
summaryFunction = Sensitivity.fc))
Run Code Online (Sandbox Code Playgroud)
但是当我得到rpart_caret_fit的摘要时
它表明它仍然使用ROC标准来选择最终模型:
rpart_caret_fit
Run Code Online (Sandbox Code Playgroud)
我如何覆盖ROC选择方法?
你过于复杂的事情.
两个类摘要已包含灵敏度作为输出.列名"Sens".它足以指定:
metric = "Sens"要train和
summaryFunction = twoClassSummary以trainControl
完整示例:
library(caret)
library(mlbench)
data(Sonar)
rpart_caret_fit <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = twoClassSummary))
rpart_caret_fit
CART
208 samples
60 predictor
2 classes: 'M', 'R'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 167, 166, 166, 166, 167
Resampling results across tuning parameters:
cp ROC Sens Spec
0.0000000 0.7088298 0.7023715 0.7210526
0.0255019 0.7075400 0.7292490 0.6684211
0.0510038 0.7105388 0.7758893 0.6405263
0.0765057 0.6904202 0.7841897 0.6294737
0.1020076 0.7104681 0.8114625 0.6094737
0.1275095 0.7104681 0.8114625 0.6094737
0.1530114 0.7104681 0.8114625 0.6094737
0.1785133 0.7104681 0.8114625 0.6094737
0.2040152 0.7104681 0.8114625 0.6094737
0.2295171 0.7104681 0.8114625 0.6094737
0.2550190 0.7104681 0.8114625 0.6094737
0.2805209 0.7104681 0.8114625 0.6094737
0.3060228 0.7104681 0.8114625 0.6094737
0.3315247 0.7104681 0.8114625 0.6094737
0.3570266 0.7104681 0.8114625 0.6094737
0.3825285 0.7104681 0.8114625 0.6094737
0.4080304 0.7104681 0.8114625 0.6094737
0.4335323 0.7104681 0.8114625 0.6094737
0.4590342 0.6500135 0.8205534 0.4794737
0.4845361 0.6500135 0.8205534 0.4794737
Sens was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.4845361.
Run Code Online (Sandbox Code Playgroud)
另外,我认为你不能指定这是不正确的 - 插入使用传递任何参数control = rpart.control(maxdepth = 6)插入符号train.....所以你几乎可以传递任何论点.
如果您要编写自己的汇总函数,请参阅"Sens"的示例:
Sensitivity.fc <- function (data, lev = NULL, model = NULL) { #every summary function takes these three arguments
obs <- data[, "obs"] #these are the real values - always in column name "obs" in data
cls <- levels(obs) #there are the levels - you can also pass this to lev argument
probs <- data[, cls[2]] #these are the probabilities for the 2nd class - useful only if prob = TRUE
class <- as.factor(ifelse(probs > 0.5, cls[2], cls[1])) #calculate the classes based on some probability treshold
Sensitivity <- caret::sensitivity(class, obs) #do the calculation - I was lazy so I used a built in function to do it for me
names(Sensitivity) <- "Sens" #the name of the output
Sensitivity
}
Run Code Online (Sandbox Code Playgroud)
现在:
rpart_caret_fit <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens", #because of this line: names(Sensitivity) <- "Sens"
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = Sensitivity.fc))
Run Code Online (Sandbox Code Playgroud)
让我们检查两者是否产生相同的结果:
set.seed(1)
fit_sens <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = Sensitivity.fc))
set.seed(1)
fit_sens2 <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = twoClassSummary))
all.equal(fit_sens$results[c("cp", "Sens")],
fit_sens2$results[c("cp", "Sens")])
TRUE
all.equal(fit_sens$bestTune,
fit_sens2$bestTune)
TRUE
Run Code Online (Sandbox Code Playgroud)