错误调整R中的SVM

Gio*_*ato 6 r machine-learning svm

我正在调整R中的SVM,我收到以下错误:

#Error in if (any(co)) { : missing value where TRUE/FALSE needed
Run Code Online (Sandbox Code Playgroud)

我正在使用插入包

svmRTune <- train(x=dataTrain[,predModelContinuous],y=dataTrain[,outcome],method = "svmRadial", tuneLength = 14, trControl = trCtrl)
Run Code Online (Sandbox Code Playgroud)

训练集结构是

str(dataTrain)
'data.frame':   40001 obs. of  42 variables:
 $ PolNum     : num  2e+08 2e+08 2e+08 2e+08 2e+08 ...
 $ sex        : Factor w/ 2 levels "Male","Female": 1 1 1 2 1 2 1 1 1 2 ...
 $ type       : Factor w/ 6 levels "A","B","C","D",..: 3 1 1 2 2 4 3 3 3 2 ...
 $ catgry     : Ord.factor w/ 3 levels "Large"<"Medium"<..: 2 2 2 3 3 3 3 2 2 2 ...
 $ occup      : Factor w/ 5 levels "Employed","Housewife",..: 2 1 1 1 5 4 1 1 4 2 ...
 $ age        : num  48 23 23 39 24 39 28 43 45 38 ...
 $ group      : Factor w/ 20 levels "1","2","3","4",..: 15 16 12 16 14 8 16 9 12 8 ...
 $ bonus      : Ord.factor w/ 21 levels "-50"<"-40"<"-30"<..: 14 8 4 3 5 2 5 5 1 15 ...
 $ poldur     : num  7 1 1 14 2 4 11 2 8 5 ...
 $ value      : num  1120 21755 18430 11930 24850 ...
 $ adind      : Factor w/ 2 levels "No","Yes": 2 1 1 2 1 2 2 2 1 1 ...
 $ Pcode      : chr  "SC22" "CT109" "MA1" "SA12" ...
 $ Area       : Factor w/ 10 levels "CT","JU","MA",..: 7 1 3 6 6 6 6 4 1 2 ...
 $ Density    : num  270.5 57.3 43.2 167.9 169.8 ...
 $ Prem       : num  1159 532 527 197 908 ...
 $ Premad     : num  53.1 413.7 410.7 61.6 824.6 ...
 $ numclm     : num  0 1 0 1 0 0 0 1 0 0 ...
 $ Invite     : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Renewaltp  : num  1302 928 632 291 960 ...
 $ Renewalad  : num  58.4 599 440.4 71.3 682 ...
 $ Markettp   : num  1110 884 565 253 833 ...
 $ Marketad   : num  53.4 611.4 431.6 55.5 587 ...
 $ Premtot    : num  1212 532 527 259 908 ...
 $ Renewaltot : num  1361 928 632 362 960 ...
 $ Markettot  : num  1163 884 565 309 833 ...
 $ Renew      : Ord.factor w/ 2 levels "No"<"Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Premchng   : num  1.12 1.74 1.2 1.4 1.06 ...
 $ Compmeas   : num  1.17 1.05 1.12 1.17 1.15 ...
 $ numclmRec  : Ord.factor w/ 3 levels "None"<"One"<"Two or more": 1 2 1 2 1 1 1 2 1 1 ...
 $ PremChngRec: Factor w/ 20 levels "[0.546,0.758)",..: 16 20 18 19 14 3 7 19 17 11 ...
 $ ageRec     : Factor w/ 20 levels "[19,22)","[22,25)",..: 14 2 2 9 2 9 4 11 12 9 ...
 $ valueRec   : Factor w/ 20 levels "[ 1005, 3290)",..: 1 15 13 9 17 5 12 12 19 1 ...
 $ densityRec : Factor w/ 20 levels "[ 14.4, 25.0)",..: 19 6 5 15 15 13 15 1 5 11 ...
 $ CompmeasRec: Factor w/ 20 levels "[0.716,0.869)",..: 12 6 10 13 12 18 11 16 18 14 ...
 $ poldurRec  : Ord.factor w/ 16 levels "1"<"2"<"3"<"4"<..: 7 1 1 14 2 4 11 2 8 5 ...
 $ ageST      : num  0.407 -1.34 -1.34 -0.222 -1.27 ...
 $ numclmST   : num  -0.433 1.627 -0.433 1.627 -0.433 ...
 $ PremchngST : num  0.591 3.709 0.98 1.985 0.265 ...
 $ valueST    : num  -1.462 0.499 0.183 -0.434 0.793 ...
 $ DensityST  : num  1.918 -0.748 -0.924 0.636 0.659 ...
 $ CompmeasST : num  0.224 -0.539 -0.098 0.248 0.113 ...
 $ poldurST   : num  0.097 -1.2 -1.2 1.61 -0.984 ...
Run Code Online (Sandbox Code Playgroud)

sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:

[1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252   
[3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C                  
[5] LC_TIME=Italian_Italy.1252  

attached base packages:

 [1] parallel  splines   grid      stats     graphics  grDevices utils    
 [8] datasets  methods   base  

other attached packages:

 [1] C50_0.1.0-16       kernlab_0.9-19     nnet_7.3-8         plyr_1.8.1        
 [5] gbm_2.1            randomForest_4.6-7 rpart_4.1-8        klaR_0.6-10       
 [9] MASS_7.3-31        doParallel_1.0.8   iterators_1.0.6    foreach_1.4.1     
[13] pROC_1.7.1         mda_0.4-4          class_7.3-10       earth_3.2-7       
[17] plotrix_3.5-5      plotmo_1.3-3       Formula_1.1-1      survival_2.37-7   
[21] caret_6.0-24       ggplot2_0.9.3.1    lattice_0.20-29    rj_1.1.3-1        

loaded via a namespace (and not attached):

 [1] car_2.0-19          cluster_1.15.2      codetools_0.2-8    
 [4] colorspace_1.2-4    combinat_0.0-8      compiler_3.0.2     
 [7] dichromat_2.0-0     digest_0.6.4        gtable_0.1.2       
[10] Hmisc_3.14-3        labeling_0.2        latticeExtra_0.6-26
[13] munsell_0.4.2       proto_0.3-10        RColorBrewer_1.0-5 
[16] Rcpp_0.11.1         reshape2_1.2.2      rj.gd_1.1.3-1      
[19] scales_0.2.3        stringr_0.6.2       tools_3.0.2    
Run Code Online (Sandbox Code Playgroud)

小智 3

只是发布以防其他人遇到这个问题。这似乎是由于训练数据集中包含一个因素或字符变量引起的。

为什么svm不能取因子变量,我不知道。我用手工编码的虚拟变量替换了我的因子,效果很好,但这种方法太不优雅,无法记录。