使用Caret在R中创建k折CV的折叠

Question

使用Caret在R中创建k折CV的折叠

gco*_*cci 10 r cross-validation r-caret

我正在尝试使用可用的数据为几种分类方法/ hiper参数制作k倍CV

http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data.

该集由208行组成,每行有60个属性.我正在使用read.table函数将其读入data.frame.

下一步是将我的数据分成k个折叠,假设k = 5.我的第一次尝试是使用

test < - createFolds(t,k = 5)

我有两个问题.第一个是折叠的长度彼此不相邻:

  Length Class  Mode   
Run Code Online (Sandbox Code Playgroud)
Fold1 29 -none-数字
折叠2 14-无 - 数字
折叠3 7-无 - 数字
折叠4 5 - 无 - 数字
折叠5 5 - 无 - 数字

另一个是,这显然是根据属性索引分割我的数据,但我想分割数据本身.我认为通过转置我的data.frame,使用:

test < - t(myDataNumericValues)

但是当我调用createFolds函数时,它给了我这样的东西:

  Length Class  Mode   
Run Code Online (Sandbox Code Playgroud)
Fold1 2496 -none-数字
折叠2 2496 -none-数字
折叠3 2495
-none- 数字折叠4 2496
-none- 数字折叠5 2497 -none-数字

长度问题已经解决,但它仍然没有相应地分割我的208数据.

关于我能做什么的任何想法？你认为插入包不是最合适的吗？

提前致谢

Answer 1

top*_*epo 27

请阅读?createFolds以了解该功能的作用.它创建了索引,用于定义哪些数据保存在单独的折叠中(请参阅返回反向的选项):

  > library(caret)
  > library(mlbench)
  > data(Sonar)
  > 
  > folds <- createFolds(Sonar$Class)
  > str(folds)
  List of 10
   $ Fold01: int [1:21] 25 39 58 63 69 73 80 85 90 95 ...
   $ Fold02: int [1:21] 19 21 42 48 52 66 72 81 88 89 ...
   $ Fold03: int [1:21] 4 5 17 34 35 47 54 68 86 100 ...
   $ Fold04: int [1:21] 2 6 22 29 32 40 60 65 67 92 ...
   $ Fold05: int [1:20] 3 14 36 41 45 75 78 84 94 104 ...
   $ Fold06: int [1:21] 10 11 24 33 43 46 50 55 56 97 ...
   $ Fold07: int [1:21] 1 7 8 20 23 28 31 44 71 76 ...
   $ Fold08: int [1:20] 16 18 26 27 38 57 77 79 91 99 ...
   $ Fold09: int [1:21] 13 15 30 37 49 53 74 83 93 96 ...
   $ Fold10: int [1:21] 9 12 51 59 61 62 64 70 82 87 ...

Run Code Online (Sandbox Code Playgroud)

要使用它们来拆分数据:

   > split_up <- lapply(folds, function(ind, dat) dat[ind,], dat = Sonar)
   > dim(Sonar)
   [1] 208  61
   > unlist(lapply(split_up, nrow))
   Fold01 Fold02 Fold03 Fold04 Fold05 Fold06 Fold07 Fold08 Fold09 Fold10 
       21     21     21     21     20     21     21     20     21     21

Run Code Online (Sandbox Code Playgroud)

该函数train在此包中用于进行实际建模(您通常不需要自己进行拆分.请参阅此页).

马克斯

此响应很有用，但 ?createFolds 提供的答案并不正确。?createFolds 的内容中从未说过“它创建定义哪些数据保存在单独的折叠中的索引” (2认同)

归档时间：	12 年前
查看次数：	39128 次
最近记录：	9 年，6 月前