标签: r-caret

R ：插入符我们如何为 kNN 传递 k 参数

我使用插入符号表示 knn，最初使用tuneLength=10 运行该过程，我发现用于模型的 k=21

我想使用一组特定的 k 值来运行参数，但在传递 tuneGrid 中的值或将 k 值直接传递给训练函数时遇到错误

数据：

library(mlbench)
data(PimaIndiansDiabetes)

Run Code Online (Sandbox Code Playgroud)

代码：

grid = expand.grid(k = c(5,7,9,15,19,21)

compute_learncurve5 <- function(df=adultFile,control=control,ratio=30,fold=10,N=3,metric="Accuracy",
                                seed=1234,scaled=FALSE,DEBUG=FALSE) {
  result_df = c()
  size <- round(size=(ratio/100 * nrow(df)))
  split <-  gsub(" ","",paste(as.character(100-ratio),"/",as.character(ratio)))
  iter <-  N
  trainSize <-  nrow(df)-size
  testSize <-  size

  if (DEBUG){
    print(paste("Dimension of InputDataSet : ", dim(df)))
    print(paste("Test/Train Perct : ",ratio,"|",100-ratio,
                " : Train/Test size = ", trainSize,"|",testSize))
  }

  #Set-up data
  trainpct  <- (100-ratio)/100

  # Set-up Train and Test - Change target variable …

Run Code Online (Sandbox Code Playgroud)

r knn r-caret

E B*_*E B

lucky-day

2
推荐指数

1
解决办法

4342
查看次数

错误：参数“x”丢失，没有默认值？

由于我对 XGBoost 非常陌生，我尝试使用mlr库和模型调整参数，但在使用 setHayperPars() 学习后，使用 train() 抛出错误（特别是当我运行xgmodel行时）：colnames(x) 中的错误：参数“x”丢失，没有默认值，我无法识别这个错误意味着什么，下面是代码：

library(mlr)     
library(dplyr)
library(caret) 
library(xgboost)

set.seed(12345)
n=dim(mydata)[1]
id=sample(1:n, floor(n*0.6)) 
train=mydata[id,]
test=mydata[-id,]

traintask = makeClassifTask (data = train,target = "label")
testtask = makeClassifTask (data = test,target = "label")

#create learner
lrn = makeLearner("classif.xgboost",
                   predict.type = "response")

lrn$par.vals = list( objective="multi:softprob",
                      eval_metric="merror")

#set parameter space
params = makeParamSet( makeIntegerParam("max_depth",lower = 3L,upper = 10L),
                       makeIntegerParam("nrounds",lower = 20L,upper = 100L),
                       makeNumericParam("eta",lower = 0.1, upper = 0.3),
                       makeNumericParam("min_child_weight",lower = 1L,upper = 10L), …

Run Code Online (Sandbox Code Playgroud)

r machine-learning r-caret mlr xgboost

Jud*_*e83

2020 03-24

2
推荐指数

1
解决办法

7463
查看次数

如何获得 R 中解释的主成分方差百分比？prcomp() 和 preProcess() 比较

prcomp()我知道 PCA 可以使用基础 R 中的函数或包preProcess()中的函数caret等进行。

首先，我是否正确地说，如果我们只使用prcomp(<SOME_MATRIX>)或类型的操作的默认设置preProcess(<SOME_MATRIX>, method = "pca")，那么我们结果的唯一区别是prcomp()在进行 PCA 之前不会对数据进行居中和缩放，而 preProcess() 会这样做？因此，执行prcomp(scale(<SOME_MATRIX>))和preProcess(<SOME_MATRIX>, method = "pca")输出相同的事情吗？

prcomp()其次，更重要的是，我们如何从或的输出中获得每台 PC 解释的方差百分比preProcess()？从这两个输出中，我可以看到诸如平均值、标准差或旋转之类的信息，但我认为这些仅指“旧”变量。关于“新”电脑的信息在哪里？它们造成了多少差异？

例如，如果我正在使用preProcess(<SOME_MATRIX>, method = "pca", thresh = 0.8)并且返回 6 个 PC，则这可能会很有用，但我发现前 5 个 PC 总共解释了 79.5% 的方差。那么我可能倾向于不包括所有 6 台 PC。

r pca r-caret

kma*_*nka

2020 06-17

2
推荐指数

1
解决办法

6057
查看次数

为什么R gbm模型预测与模型匹配不匹配？

我正在使用插入符合gbm模型.当我打电话时trainedGBM$finalModel$fit,我得到的输出看起来是正确的.

但是当我打电话时predict(trainedGBM$finalModel, origData, type="response"),我会得到非常不同的结果,predict(trainedGBM$finalModel, type="response")即使附加了origData,结果也会产生不同的结果.根据我的思维方式,这些调用应该产生相同的输出.有人可以帮我识别问题吗？

library(caret)
library(gbm)

attach(origData)
gbmGrid <- expand.grid(.n.trees = c(2000), 
                       .interaction.depth = c(14:20), 
                       .shrinkage = c(0.005))
trainedGBM <- train(y ~ ., method = "gbm", distribution = "gaussian", 
                    data = origData, tuneGrid = gbmGrid, 
                    trControl = trainControl(method = "repeatedcv", number = 10, 
                                             repeats = 3, verboseIter = FALSE, 
                                             returnResamp = "all"))
ntrees <- gbm.perf(trainedGBM$finalModel, method="OOB")
data.frame(y, 
           finalModelFit = trainedGBM$finalModel$fit, 
           predictDataSpec = predict(trainedGBM$finalModel, origData, type="response", n.trees=ntrees), 
           predictNoDataSpec = predict(trainedGBM$finalModel, type="response", n.trees=ntrees))

Run Code Online (Sandbox Code Playgroud)

上面的代码产生以下部分结果:

   y …

Run Code Online (Sandbox Code Playgroud)

r machine-learning predict r-caret

作者

2015 01-11

1
推荐指数

1
解决办法

5974
查看次数

错误:使用插入符包时尝试应用非函数

我试图更多地了解这个caret包,并遇到了一个我不确定如何解决的障碍.

#loading up libraries
library(MASS)
library(caret)
library(randomForest)
data(survey)
data<-survey

#create training and test set
split <- createDataPartition(data$W.Hnd, p=.8)[[1]]
train<-data[split,]
test<-data[-split,]


#creating training parameters
control <- trainControl(method = "cv",
                        number = 10, 
                        p =.8, 
                        savePredictions = TRUE, 
                        classProbs = TRUE, 
                        summaryFunction = "twoClassSummary")

#fitting and tuning model
tuningGrid <- data.frame(.mtry = floor(seq(1 , ncol(train) , length = 6)))
rf_tune <- train(W.Hnd ~ . , 
            data=train, 
            method = "rf" ,
            metric = "ROC",
            trControl = control)

Run Code Online (Sandbox Code Playgroud)

不断收到错误:

Error in evalSummaryFunction(y, …

Run Code Online (Sandbox Code Playgroud)

r machine-learning r-caret

Min*_*Mai

2015 08-17

1
推荐指数

1
解决办法

1061
查看次数

R Caret的时间片-窗口和地平线不清晰

使用插入符号中的时间分割及其参数，如何使用xyz行拆分数据，每行的长度为12？

理想情况下，还要考虑60-20-20火车测试-验证比率。

我应该这样设置吗：

initialWindow = 12，horizon = 12，fixedWindow = TRUE？

我已经阅读了文档，但是对我来说仍然不清楚。

r time-series r-caret

sve*_*ven

2015 12-08

1
推荐指数

1
解决办法

1392
查看次数

为什么使用trainControl在插入符号中使用"xgbTree"这么慢？

我试图在多类预测问题上拟合xgboost模型,并希望caret用来进行超参数搜索.为了测试包,我使用了以下代码,当我不train使用trainControl 提供对象时需要20秒

# just use one parameter combination
xgb_grid_1 <- expand.grid(
  nrounds = 1,
  eta = 0.3,
  max_depth = 5,
  gamma = 0,
  colsample_bytree=1, 
  min_child_weight=1
)
# train
xgb_train_1 = train(
  x = as.matrix(sparse_train),
  y = conversion_tbl$y_train_c ,
  trControl = trainControl(method="none", classProbs = TRUE, summaryFunction = multiClassSummary),
  metric="logLoss",
  tuneGrid = xgb_grid_1,
  method = "xgbTree"
)

Run Code Online (Sandbox Code Playgroud)

但是,当我提供traintrainControl对象时,代码永远不会完成..或者花费很长时间(至少它完成了15分钟.

xgb_trcontrol_1 <- trainControl(
  method = "cv",
  number = 2,
  verboseIter = TRUE, 
  returnData = FALSE,
  returnResamp = "none", …

Run Code Online (Sandbox Code Playgroud)

r machine-learning r-caret

Alb*_*lby

2016 03-21

1
推荐指数

1
解决办法

3684
查看次数

ROCR错误:预测格式无效

从glmnet得到我的预测后,我试图在"ROCR"包中使用"预测"函数来获取tpr,fpr等但是得到这个错误:

pred <- prediction(pred_glmnet_s5_3class, y)
Error in prediction(pred_glmnet_s5_3class, y) : 
Format of predictions is invalid.

Run Code Online (Sandbox Code Playgroud)

我输出了glmnet预测和标签,看起来它们的格式相似,因此我不明白这里有什么无效.

代码如下,输入可以在这里输入.它是一个小型数据集,不应该花费太多时间来运行.

library("ROCR")
library("caret")
sensor6data_s5_3class <- read.csv("/home/sensei/clustering /sensor6data_f21_s5_with3Labels.csv")
sensor6data_s5_3class <- within(sensor6data_s5_3class, Class <- as.factor(Class))
sensor6data_s5_3class$Class2 <- relevel(sensor6data_s5_3class$Class,ref="1")

set.seed("4321")
inTrain_s5_3class <- createDataPartition(y = sensor6data_s5_3class$Class, p = .75, list = FALSE)
training_s5_3class <- sensor6data_s5_3class[inTrain_s5_3class,]
testing_s5_3class <- sensor6data_s5_3class[-inTrain_s5_3class,] 
y <- testing_s5_3class[,22]

ctrl_s5_3class <- trainControl(method = "repeatedcv", number = 10, repeats = 10 , savePredictions = TRUE)
model_train_glmnet_s5_3class <- train(Class2 ~ ZCR + Energy + SpectralC + …

Run Code Online (Sandbox Code Playgroud)

r r-caret proc-r-package

tac*_*qy2

lucky-day

1
推荐指数

1
解决办法

1万
查看次数

是否有可能从R中的混淆矩阵中检索假阳性和假阴性？

我使用R生成了一个混淆矩阵,如下所示.

是否可以从该矩阵中检索出假负值61并分配给R中的变量？$ byClass似乎不适合这种情况.谢谢.

Confusion Matrix and Statistics

              Reference
    Prediction   no  yes
           no  9889   61
           yes    6   44

               Accuracy : 0.9933          
                 95% CI : (0.9915, 0.9948)
    No Information Rate : 0.9895          
    P-Value [Acc > NIR] : 4.444e-05       

                  Kappa : 0.5648          
 Mcnemar's Test P-Value : 4.191e-11       

            Sensitivity : 0.9994          
            Specificity : 0.4190          
         Pos Pred Value : 0.9939          
         Neg Pred Value : 0.8800          
             Prevalence : 0.9895          
         Detection Rate : 0.9889          
   Detection Prevalence : 0.9950          
      Balanced Accuracy : 0.7092          

       'Positive' Class : no

Run Code Online (Sandbox Code Playgroud)

r confusion-matrix r-caret

作者

2017 03-05

1
推荐指数

1
解决办法

1787
查看次数

优化插入符号的灵敏度似乎仍然优化ROC

我正在尝试使用rpart在插入符号中的模型选择中最大化灵敏度.为此,我试图复制这里给出的方法(向下滚动到带有用户定义函数FourStat的示例)插入符号的github页面

# create own function so we can use "sensitivity" as our metric to maximise:
Sensitivity.fc <- function (data, lev = levels(data$obs), model = NULL) {
    out <- c(twoClassSummary(data, lev = levels(data$obs), model = NULL))
    c(out, Sensitivity = out["Sens"])
}

rpart_caret_fit <- train(outcome~pred1+pred2+pred3+pred4,
    na.action = na.pass,
    method = "rpart", 
    control=rpart.control(maxdepth = 6),
    tuneLength = 20, 
    # maximise sensitivity
    metric = "Sensitivity", 
    maximize = TRUE,
    trControl = trainControl(classProbs = TRUE,
    summaryFunction = Sensitivity.fc))

Run Code Online (Sandbox Code Playgroud)

但是当我得到rpart_caret_fit的摘要时

它表明它仍然使用ROC标准来选择最终模型: