标签: r-caret

在不调整 R 的情况下在 Caret 中训练模型

看起来，当caret你训练模型时，你几乎被迫进行参数调整。我知道这通常是一个好主意，但是如果我想在训练时明确说明模型参数怎么办？

svm.nf <- train(y ~ .,
                data = nf,
                method = "svmRadial",
                C = 4, sigma = 0.25, tuneLength = 0)

Run Code Online (Sandbox Code Playgroud)

出了问题；所有 RMSE 指标值均缺失：

    RMSE        Rsquared  
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :2     NA's   :2

Run Code Online (Sandbox Code Playgroud)

train.default（x，y，weights = w，...）中的错误：停止另外：警告消息：在nominalTrainWorkflow（x = x，y = y，wts =weights，info = trainInfo，：存在缺失值在重新抽样的绩效指标中。

r data-mining r-caret

Nit*_*tro

2016 09-24

4
推荐指数

1
解决办法

678
查看次数

通过 caret 包在 KNN 中使用 F1 分数度量

我尝试使用 F1 分数来确定哪个 k 值可以最大化模型以达到其给定的目的。模型是通过包train中的函数制作的caret。

示例数据集： https: //www.kaggle.com/lachster/churndata

我当前的代码包括以下内容（作为 f1 分数的函数）：

f1 <- function(data, lev = NULL, model = NULL) {
    precision <- posPredValue(data$pred, data$obs, positive = "pass")
    recall <- sensitivity(data$pred, data$obs, positive = "pass")
    f1_val <- (2*precision*recall) / (precision + recall)
    names(f1_val) <- c("F1")
    f1_val
}

Run Code Online (Sandbox Code Playgroud)

以下为列车控制：

train.control <- trainControl(method = "repeatedcv", number = 10, repeats = 3, 
summaryFunction = f1, search = "grid")

Run Code Online (Sandbox Code Playgroud)

以下是我最终执行的命令train：

x <- train(CHURN ~. , 
  data = experiment, …

Run Code Online (Sandbox Code Playgroud)

r machine-learning r-caret

Nal*_*cal

2022 07-26

4
推荐指数

1
解决办法

2428
查看次数

如何使用 R 中的包“caret”将因子包含在回归模型中？

我正在尝试使用 R 包构建不同的回归模型caret。对于数据来说，它既包括数值，也包括因子。

问题 1：在回归模型中同时包含数值和因子的正确方法是什么caret？

问题2：回归模型通常需要进行数据预处理（中心和尺度），那么因子的预处理如何进行？

library(caret)
data("mtcars")
mydata = mtcars[, -c(8,9)]
set.seed(100)
mydata$dir = sample(x=c("N", "E", "S", "W"), size = 32, replace = T)
mydata$dir = as.factor(mydata$dir)
class(mydata$dir)    # Factor with four levels

MyControl = trainControl(
  method = "repeatedcv", 
  number = 5, 
  repeats = 2, 
  verboseIter = TRUE, 
  savePredictions = "final"
)

model_glm <- train(
  hp ~ ., 
  data = mydata, 
  method = "glm", 
  metric = "RMSE", 
  preProcess = c('center', 'scale'), 
  trControl …

Run Code Online (Sandbox Code Playgroud)

regression r machine-learning r-caret dummy-variable

Yan*_*ang

2022 10-21

4
推荐指数

1
解决办法

456
查看次数

使用caret包运行带有controls = cforest_unbiased()的cforest

我想使用插入包运行一个没有偏见的cforest.这可能吗？

tc <- trainControl(method="cv",
               number=f,
               index=indexList,
               savePredictions=T,
               classProbs = TRUE,
               summaryFunction = twoClassSummary)
createCfGrid <- function(len, data) {
    g = createGrid("cforest", len, data)
    g = expand.grid(.controls = cforest_unbiased(mtry = 5, ntree = 1000))
    return(g)
}
set.seed(1)
(cfMatFit <- train(as.factor(f1win) ~ .,
                   data=df,
                   method="cforest",
                   metric="ROC",
                   trControl=tc,
                   tuneGrid = createCfGrid))

Run Code Online (Sandbox Code Playgroud)

错误是 Error in as.character.default(<S4 object of class "ForestControl">) : no method for coercing this S4 class to a vector

这是因为cforest_control()无法强制转换为数据框.如果我使用,该功能可以工作:

...
g = expand.grid(.mtry = 5)
...

Run Code Online (Sandbox Code Playgroud)

但是,如果我想更改ntree,这没有任何效果:

...
g = expand.grid(.mtry …

Run Code Online (Sandbox Code Playgroud)

r random-forest r-caret

Jam*_*ers

2015 01-11

3
推荐指数

1
解决办法

3550
查看次数

train.default(x,y,weights = w,...)出错:无法确定最终调整参数

我是机器学习的新手,正在尝试Kaggle的森林覆盖预测比赛,但我很早就被挂了.运行下面的代码时出现以下错误.

Error in train.default(x, y, weights = w, ...) : 
final tuning parameters could not be determined
In addition: There were 50 or more warnings (use warnings() to see the first 50)

# Load the libraries
library(ggplot2); library(caret); library(AppliedPredictiveModeling)
library(pROC)
library(Amelia)

set.seed(1234)

# Load the forest cover dataset from the csv file
rawdata <- read.csv("train.csv",stringsAsFactors = F)
#this data won't be used in model evaluation. It will only be used for the submission.
test <- read.csv("test.csv",stringsAsFactors …

Run Code Online (Sandbox Code Playgroud)

r random-forest r-caret kaggle

Dav*_*son

2017 03-03

3
推荐指数

1
解决办法

9099
查看次数

R：脱字符号包preProcess（）

我对数据建模和R相当陌生，我想知道是否有人可以给我一些建议。

我正在使用R复制在SPSS Modeller中构建的模型，然后尝试对其进行改进。目前，我正在使用插入符号包构建基本的线性模型。

我使用preProcess（）缩放和居中数字字段，包括模型正在预测的数字变量。

preProcValues <- preProcess(Data_Numeric, method = c("center", "scale"))
Data_PreProc <- predict(preProcValues, Data_Numeric)

Run Code Online (Sandbox Code Playgroud)

但是，当我生成模型时，我发现这种预处理可以生成更准确的模型，但是我不确定如何获取缩放结果和居中结果并获得“结果”。该模型用作定价工具，因此如果需要，我需要对它进行缩放和居中？

r r-caret

Mat*_*son

lucky-day

3
推荐指数

1
解决办法

3293
查看次数

lm使用插入火车的结果

我使用插入符R包作为一个非常方便的建模包装.虽然这是一个奇怪的用途,但在使用模型类型="lm"和交叉验证"无"时,我在从模型中提取结果时遇到一些麻烦.见下面的例子:

library(caret)
## Make data
> Xs       <- matrix(rnorm(300*20), nrow = 300, ncol = 20)
> Yvec     <- rnorm(300)
## Make traincontrol, cv of "none"
> tcontrol <- trainControl(method = "none")
## Fit lm model using train
> fit      <- train(x= Xs, y = Yvec, method = "lm", metric = "RMSE", trControl = tcontrol)
> fit$results
[1] RMSE      Rsquared  parameter
<0 rows> (or 0-length row.names)

Run Code Online (Sandbox Code Playgroud)

任何想法为什么适合$结果是空的？对于所有其他型号和CV类型,这似乎有效.例如使用2倍CV:

> tcontrol2 <- trainControl(method = "cv", number = 2)
> fit2      <- train(x= Xs, …

Run Code Online (Sandbox Code Playgroud)

r lm r-caret

Ach*_*oud

2015 10-24

3
推荐指数

1
解决办法

5077
查看次数

R中将正类设为1

我目前正在使用'randomForest'软件包开发预测模型.

适合我的模型如下

rf <- foreach(ntree=rep(10, 3), .combine= combine,  .packages='randomForest') %dopar% {
randomForest(bou~.,data=train, trees=50, importance=TRUE)}

Run Code Online (Sandbox Code Playgroud)

当使用'caret'包中的'confusionMatrix'时,我得到以下结果:

我想知道是否可以在模型中将正类设置为1.我在包描述中搜索但是找不到任何关于它的信息.

非常感谢你.

编辑:我找到了.它是'caret'包中'confusionMatrix'命令的一个选项.我在错误的地方跋涉.如果需要,这是一个例子.

confusionMatrix(predicted,true_values,positive='1')

Run Code Online (Sandbox Code Playgroud)

我应该留下我的帖子还是删除它？

r confusion-matrix random-forest r-caret

Lau*_*gon

2016 07-08

3
推荐指数

1
解决办法

2242
查看次数

R中的findCorrelation函数

我对findCorrelation()R的CARET软件包的功能有一些疑问。

当我使用此代码时：

  correlations <- cor(list)
  highCorr <- findCorrelation(correlations, cutoff = .6, names = FALSE)
  new_list <- list[, -highCorr]

Run Code Online (Sandbox Code Playgroud)

它会删除0.6以上和-0.6以下的所有功能吗？
可以说我有两个相关的特征，即男性和女性（由于缺少值而并非全部相同），如果函数相互关联，该函数如何选择删除哪一个？

r correlation r-caret

作者

2018 11-29

3
推荐指数

1
解决办法

4079
查看次数

插入包未能安装

我试图安装插入包时遇到此错误:

ERROR: compilation failed for package ‘ddalpha’
* removing ‘/home/rspark/R/x86_64-redhat-linux-gnu-library/3.3/ddalpha’
Warning in install.packages :
  installation of package ‘ddalpha’ had non-zero exit status
ERROR: dependency ‘ddalpha’ is not available for package ‘recipes’
* removing ‘/home/rspark/R/x86_64-redhat-linux-gnu-library/3.3/recipes’
Warning in install.packages :
  installation of package ‘recipes’ had non-zero exit status
ERROR: dependency ‘recipes’ is not available for package ‘caret’
* removing ‘/home/rspark/R/x86_64-redhat-linux-gnu-library/3.3/caret’
Warning in install.packages :
  installation of package ‘caret’ had non-zero exit status

Run Code Online (Sandbox Code Playgroud)

有任何想法吗？

install.packages("ddalpha")

Run Code Online (Sandbox Code Playgroud)

它给出了同样的错误:

/usr/lib64/R/library/BH/include/boost/exception/exception.hpp:137: error: expected declaration before end of …

Run Code Online (Sandbox Code Playgroud)

r r-caret

Mig*_*tos

2018 03-31

3
推荐指数

1
解决办法

7656
查看次数

标签统计

r ×10

r-caret ×10

random-forest ×3

machine-learning ×2

confusion-matrix ×1

correlation ×1

data-mining ×1

dummy-variable ×1

kaggle ×1

lm ×1

regression ×1

标签 统计

标签统计