Caret包定制公制

Mar*_*tos 8 r r-caret

我正在我的一个项目中使用插入功能"train()",我想添加一个"自定义指标"F1分数.我看了这个url 插入包 但我无法理解如何用可用参数构建这个乐谱.

有一个自定义指标的示例如下:

## Example with a custom metric
madSummary <- function (data,
lev = NULL,
model = NULL) {
out <- mad(data$obs - data$pred,
na.rm = TRUE)
names(out) <- "MAD"
out
}
robustControl <- trainControl(summaryFunction = madSummary)
marsGrid <- expand.grid(degree = 1, nprune = (1:10) * 2)
earthFit <- train(medv ~ .,
data = BostonHousing,
method = "earth",
tuneGrid = marsGrid,
metric = "MAD",
maximize = FALSE,
trControl = robustControl)
Run Code Online (Sandbox Code Playgroud)

更新:

我尝试了你的代码,但问题是它不适用于多个类,如下面的代码(显示F1分数,但它很奇怪)我不确定,但我认为函数F1_score仅适用于二进制类

library(caret)
library(MLmetrics)

set.seed(346)
dat <- iris

## See http://topepo.github.io/caret/training.html#metrics
f1 <- function(data, lev = NULL, model = NULL) {

print(data)
  f1_val <- F1_Score(y_pred = data$pred, y_true = data$obs)
  c(F1 = f1_val)
}

# Split the Data into .75 input
in_train <- createDataPartition(dat$Species, p = .70, list = FALSE)

trainClass <- dat[in_train,]
testClass <- dat[-in_train,]



set.seed(35)
mod <- train(Species ~ ., data = trainClass ,
             method = "rpart",
             metric = "F1",
             trControl = trainControl(summaryFunction = f1, 
                                  classProbs = TRUE))

print(mod)
Run Code Online (Sandbox Code Playgroud)

我编写了一个手动F1分数,其中一个输入了混淆矩阵:(我不确定我们是否可以在"summaryFunction"中有一个混淆矩阵

F1_score <- function(mat, algoName){

##
## Compute F1-score
##


# Remark: left column = prediction // top = real values
recall <- matrix(1:nrow(mat), ncol = nrow(mat))
precision <- matrix(1:nrow(mat), ncol = nrow(mat))
F1_score <- matrix(1:nrow(mat), ncol = nrow(mat))


for(i in 1:nrow(mat)){
  recall[i] <- mat[i,i]/rowSums(mat)[i]
  precision[i] <- mat[i,i]/colSums(mat)[i]
}

for(i in 1:ncol(recall)){
   F1_score[i] <- 2 * ( precision[i] * recall[i] ) / ( precision[i] + recall[i])
 }

 # We display the matrix labels
 colnames(F1_score) <- colnames(mat)
 rownames(F1_score) <- algoName

 # Display the F1_score for each class
 F1_score

 # Display the average F1_score
 mean(F1_score[1,])
}
Run Code Online (Sandbox Code Playgroud)

top*_*epo 19

你应该看看这里的细节.一个工作的例子是

library(caret)
library(MLmetrics)

set.seed(346)
dat <- twoClassSim(200)

## See http://topepo.github.io/caret/training.html#metrics
f1 <- function(data, lev = NULL, model = NULL) {
  f1_val <- F1_Score(y_pred = data$pred, y_true = data$obs, positive = lev[1])
  c(F1 = f1_val)
}

set.seed(35)
mod <- train(Class ~ ., data = dat,
             method = "rpart",
             tuneLength = 5,
             metric = "F1",
             trControl = trainControl(summaryFunction = f1, 
                                      classProbs = TRUE))
Run Code Online (Sandbox Code Playgroud)

马克斯