Mar*_*ark 6 r machine-learning random-forest r-caret
我正在使用插入包来分析使用游侠构建的随机森林模型.我无法弄清楚如何使用tuneGrid参数调用train函数来调整模型参数.
我想我调用了tuneGrid参数是错误的,但是无法弄清楚为什么它是错误的.任何帮助,将不胜感激.
data(iris)
library(ranger)
model_ranger <- ranger(Species ~ ., data = iris, num.trees = 500, mtry = 4,
importance = 'impurity')
library(caret)
# my tuneGrid object:
tgrid <- expand.grid(
num.trees = c(200, 500, 1000),
mtry = 2:4
)
model_caret <- train(Species ~ ., data = iris,
method = "ranger",
trControl = trainControl(method="cv", number = 5, verboseIter = T, classProbs = T),
tuneGrid = tgrid,
importance = 'impurity'
)
Run Code Online (Sandbox Code Playgroud)
mis*_*use 17
以下是插入符号中的游侠语法:
library(caret)
Run Code Online (Sandbox Code Playgroud)
.在调整参数之前添加:
tgrid <- expand.grid(
.mtry = 2:4,
.splitrule = "gini",
.min.node.size = c(10, 20)
)
Run Code Online (Sandbox Code Playgroud)
插入符号只支持这三种,而不是树木的数量.在火车上你可以指定num.trees和important:
model_caret <- train(Species ~ ., data = iris,
method = "ranger",
trControl = trainControl(method="cv", number = 5, verboseIter = T, classProbs = T),
tuneGrid = tgrid,
num.trees = 100,
importance = "permutation")
Run Code Online (Sandbox Code Playgroud)
获得变量重要性:
varImp(model_caret)
#output
Overall
Petal.Length 100.0000
Petal.Width 84.4298
Sepal.Length 0.9855
Sepal.Width 0.0000
Run Code Online (Sandbox Code Playgroud)
要检查这是否有效,将树的数量设置为1000+ - 拟合将慢得多.更改后importance = "impurity":
#output:
Overall
Petal.Length 100.00
Petal.Width 81.67
Sepal.Length 16.19
Sepal.Width 0.00
Run Code Online (Sandbox Code Playgroud)
如果它不起作用,我建议从CRAN安装最新的游侠,从git hub安装插入符号:
devtools::install_github('topepo/caret/pkg/caret')
Run Code Online (Sandbox Code Playgroud)
训练树木的数量,您可以使用lapply由createMultiFolds或创建的固定折叠createFolds.