标签: r-caret

插入火车射频模型 - 莫名其妙的长时间执行

在尝试使用插入符号包训练随机森林模型时,我注意到执行时间莫名其妙:

> set.seed = 1;
> n = 500;
> m = 30;
> x = matrix(rnorm(n * m), nrow = n);
> y = factor(sample.int(2, n, replace = T), labels = c("yes", "no"))
> require(caret);
> require(randomForest);
> print(system.time({rf <- randomForest(x, y);}));
   user  system elapsed 
   0.99    0.00    0.98 
> print(system.time({rfmod <- train(x = x, y = y,
+                method = "rf",
+                metric = "Accuracy",
+                trControl = trainControl(classProbs = T)
+ );}));
   user  system elapsed 
  95.83    0.71   97.26 …
Run Code Online (Sandbox Code Playgroud)

r package random-forest r-caret

4
推荐指数
1
解决办法
4108
查看次数

在插入符号中使用eml:类概率错误

我想比较标准的神经网络方法发挥到了极致学习机分类器(基于ROC指标),使用方法"nnet",并"elm"在R包caret。对于nnet,一切正常,但是使用时method = "elm"出现以下错误:

Error in evalSummaryFunction(y, wts = weights, ctrl = trControl, lev = classLevels,  : 
  train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl()
In addition: Warning messages:
1: In train.default(x, y, weights = w, ...) :
  At least one of the class levels are not valid R variables names; This may cause errors if class probabilities are generated because the variables names will be converted …
Run Code Online (Sandbox Code Playgroud)

r r-caret

4
推荐指数
1
解决办法
1449
查看次数

何时在R中的插入符包中使用train()的索引和种子参数

主要问题:

在阅读文档和谷歌搜索之后,我仍然难以确定预先定义重采样指数的情况,例如:

resamples <- createResample(classVector_training, times = 500, list=TRUE)
Run Code Online (Sandbox Code Playgroud)

或预定义的种子,如:

seeds <- vector(mode = "list", length = 501) #length is = (n_repeats*nresampling)+1
for(i in 1:501) seeds[[i]]<- sample.int(n=1000, 1) 
Run Code Online (Sandbox Code Playgroud)

我的计划是通过doParallel软件包使用并行处理来训练一堆不同的可重现模型.由于已经设置了种子,是否不需要预定义重新采样?我是否需要以上述方式预定义种子,而不是在trainControl对象中设置seeds = NULL,因为我打算使用并行处理?是否有任何理由预先定义索引和种子,因为我通过搜索谷歌至少看过一次?什么是使用indexOut的原因?

问题:

到目前为止,我已经设法为RF运行良好的列车:

rfControl <- trainControl(method="oob", number = 500, p = 0.7, returnData=TRUE,   returnResamp = "all", savePredictions=TRUE, classProbs = TRUE, summaryFunction = twoClassSummary, allowParallel=TRUE)
mtryGrid <- expand.grid(mtry = 9480^0.5) #set mtry parameter to the square root of the number of variables
rfTrain <- train(x = training, y = classVector_training, method …
Run Code Online (Sandbox Code Playgroud)

parallel-processing r machine-learning data-mining r-caret

4
推荐指数
1
解决办法
2298
查看次数

在Caret中出现此错误

我收到以下错误,我不知道可能出了什么问题.我正在使用R Studio和3.1.3版本的R for Windows 8.1并使用Caret包进行数据挖掘.

我有以下培训数据:

str(training)

'data.frame':   212300 obs. of  21 variables:

 $ FL_DATE_MDD_MMDD     : int  101 101 101 101 101 101 101 101 101 101 ...

 $ FL_DATE              : int  1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 ...

 $ UNIQUE_CARRIER       : Factor w/ 13 levels "9E","AA","AS",..: 11 10 2 5 8 9 11 10 10 10 ...

 $ DEST                 : Factor w/ 150 levels "ABE","ABQ","ALB",..: 111 70 82 8 8 31 110 44 53 80 …
Run Code Online (Sandbox Code Playgroud)

r r-caret

4
推荐指数
2
解决办法
7337
查看次数

使用插入符号指定交叉验证折叠

您好,并提前致谢.我正在使用caretnnet包中交叉验证神经网络.在函数的method参数中,trainControl我可以指定交叉验证类型,但所有这些都随机选择观察结果以进行交叉验证.无论如何,我可以使用插入符号通过ID或硬编码参数来交叉验证我的数据中的特定观察结果吗?例如,这是我当前的代码:

library(nnet) 
library(caret) 
library(datasets) 

data(iris) 

train.control <- trainControl( 
    method = "repeatedcv" 
    , number = 4 
    , repeats = 10 
    , verboseIter = T 
    , returnData = T 
    , savePredictions = T 
    ) 

tune.grid <- expand.grid( 
    size = c(2,4,6,8)
    ,decay = 2^(-3:1) 
    ) 

nnet.train <- train( 
    x = iris[,1:4] 
    , y = iris[,5] 
    , method = "nnet" 
    , preProcess = c("center","scale")  
    , metric = "Accuracy" 
    , trControl = train.control 
    , tuneGrid = tune.grid …
Run Code Online (Sandbox Code Playgroud)

r neural-network cross-validation nnet r-caret

4
推荐指数
1
解决办法
2083
查看次数

出了点问题; 缺少所有ROC指标值:

我正在使用插入符包训练R中的模型:

ctrl <- trainControl(method = "repeatedcv", repeats = 3,  summaryFunction = twoClassSummary)

logitBoostFit <- train(LoanStatus~., credit, method = "LogitBoost", family=binomial, preProcess=c("center", "scale", "pca"), 
    trControl = ctrl)
Run Code Online (Sandbox Code Playgroud)

我收到以下警告:

Warning message:
In train.default(x, y, weights = w, ...): The metric "Accuracy" was not in the result set. ROC will be used instead.Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.
Something is wrong; all the ROC metric …
Run Code Online (Sandbox Code Playgroud)

r r-caret

4
推荐指数
2
解决办法
3688
查看次数

如何使用Caret包调整多个参数?

我正在构建一个CART模型,我正在尝试调整rpart-CP和Maxdepth的2个参数.虽然Caret软件包一次适用于一个参数,但当两者都使用时它会不断抛出错误而我无法弄清楚为什么

library(caret)
data(iris)
tc <- trainControl("cv",10)
rpart.grid <- expand.grid(cp=seq(0,0.1,0.01), minsplit=c(10,20)) 
train(Petal.Width ~ Petal.Length + Sepal.Width + Sepal.Length, data=iris, method="rpart", 
      trControl=tc,  tuneGrid=rpart.grid)
Run Code Online (Sandbox Code Playgroud)

我收到以下错误:

Error in train.default(x, y, weights = w, ...) : 
  The tuning parameter grid should have columns cp
Run Code Online (Sandbox Code Playgroud)

performance r r-caret

4
推荐指数
2
解决办法
6716
查看次数

错误-lognet(x,is.sparse,ix,jx,y,weights,offset,alpha,nobs)=等错误

在插入符号中使用glmnet时出现错误

下面的示例加载库

library(dplyr)
library(caret)
library(C50)
Run Code Online (Sandbox Code Playgroud)

从库C50加载流失数据集

data(churn)
Run Code Online (Sandbox Code Playgroud)

创建x和y变量

churn_x <- subset(churnTest, select= -churn)   
churn_y <- churnTest[[20]]
Run Code Online (Sandbox Code Playgroud)

使用createFolds()在churn_y(目标变量)上创建5个CV折叠

 myFolds <- createFolds(churn_y, k = 5)
Run Code Online (Sandbox Code Playgroud)

创建trainControl对象:myControl

myControl <- trainControl(
 summaryFunction = twoClassSummary,
 classProbs = TRUE, # IMPORTANT!
 verboseIter = TRUE,
 savePredictions = TRUE,
 index = myFolds
)
Run Code Online (Sandbox Code Playgroud)

适合glmnet模型:model_glmnet

model_glmnet <- train(
  x = churn_x, y = churn_y,
  metric = "ROC",
  method = "glmnet",
  trControl = myControl
)
Run Code Online (Sandbox Code Playgroud)

我收到以下错误

lognet(x,is.sparse,ix,jx,y,权重,偏移量,alpha,nobs,错误::NA / NaN / Inf在外部函数调用中(arg 5)另外:警告消息:在lognet(x,is .sparse,ix,jx,y,权重,偏移量,alpha,nobs:NAS由强制性引入

我已经检查过,并且churn_x变量中没有缺失值

sum(is.na(churn_x))
Run Code Online (Sandbox Code Playgroud)

有人知道答案吗?

r glmnet r-caret

4
推荐指数
1
解决办法
5890
查看次数

插入符号中的其他指标 - PPV,敏感性,特异性

我在R中使用插入符号进行逻辑回归:

  ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10, 
                       savePredictions = TRUE)

  mod_fit <- train(Y ~ .,  data=df, method="glm", family="binomial",
                   trControl = ctrl)

  print(mod_fit)
Run Code Online (Sandbox Code Playgroud)

打印的默认指标是准确度和Cohen kappa.我想提取匹配的指标,如敏感性,特异性,阳性预测值等,但我找不到一个简单的方法来做到这一点.提供了最终的模型,但它对所有数据进行了训练(据我从文档中可以看出),所以我不能用它来重新预测.

混淆矩阵计算所有必需参数,但将其作为汇总函数传递不起作用:

  ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10, 
                       savePredictions = TRUE, summaryFunction = confusionMatrix)

  mod_fit <- train(Y ~ .,  data=df, method="glm", family="binomial",
                   trControl = ctrl)

Error: `data` and `reference` should be factors with the same levels. 
13.
stop("`data` and `reference` should be factors with the same levels.", …
Run Code Online (Sandbox Code Playgroud)

r r-caret

4
推荐指数
1
解决办法
592
查看次数

Quanteda的朴素贝叶斯与插入符号:结果截然不同

我正在尝试使用这些包quanteda,caret并根据经过训练的样本对文本进行分类.作为试运行,我想比较的内置的朴素贝叶斯分类器quanteda与的那些caret.但是,我似乎caret无法正常工作.

这是一些复制代码.首先是quanteda侧面:

library(quanteda)
library(quanteda.corpora)
library(caret)
corp <- data_corpus_movies
set.seed(300)
id_train <- sample(docnames(corp), size = 1500, replace = FALSE)

# get training set
training_dfm <- corpus_subset(corp, docnames(corp) %in% id_train) %>%
  dfm(stem = TRUE)

# get test set (documents not in id_train, make features equal)
test_dfm <- corpus_subset(corp, !docnames(corp) %in% id_train) %>%
  dfm(stem = TRUE) %>% 
  dfm_select(pattern = training_dfm, 
             selection = "keep")

# train model on sentiment
nb_quanteda <- textmodel_nb(training_dfm, …
Run Code Online (Sandbox Code Playgroud)

r supervised-learning text-classification r-caret quanteda

4
推荐指数
1
解决办法
289
查看次数