使用插入符号训练模型时,行搜索失败

Question

使用插入符号训练模型时,行搜索失败

The*_*oat 7 r machine-learning svm r-caret

我正在使用插入符号中的train函数来训练SVM,使用svmRadial内核进行二进制分类任务.

当我在我的数据上运行train函数时,我逐渐得到这些消息

line search fails -2.13865 -0.1759025 1.01927e-05 3.812143e-06 -5.240749e-08 -1.810113e-08 -6.03178e-13line search fails -0.7148131 0.1612894 2.32937e-05 3.518543e-06 -1.821269e-08 -1.37704e-08 -4.726926e-13

代码完成后(编译/运行？)我也收到了以下警告:

    > warnings()
Warning messages:
1: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
2: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
3: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
4: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
5: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
6: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
7: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
8: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
9: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
10: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
11: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
12: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
13: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
14: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
15: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
16: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
17: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
18: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
19: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
20: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
21: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
22: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
23: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
24: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
25: In method$predict(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class prediction calculations failed; returning NAs
26: In method$prob(modelFit = modelFit, newdata = newdata,  ... :
  kernlab class probability calculations failed; returning NAs
27: In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
28: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  ... :
  There were missing values in resampled performance measures.

Run Code Online (Sandbox Code Playgroud)

正如您从上面的警告中可以看到的那样,提到了一些概率计算的NA值,为什么这些计算会失败？

根据@HFBrowning请求,这里是我正在使用的数据的示例.我正在尝试建立一个分类器,以预测电信小区是否过冲或过度.(过去).

> head(imbal_training,10)
   Total.Tx.Height Antenna.Tilt Antenna.Gain Ant.Vert.Beamwidth       RTWP Voice.Drops Range Max.Distance Rural Suburban Urban
2            31.25            0         15.9               10.0 -103.55396          12  5.14         6.24     1        0     0
5            31.25            0         18.2                4.4 -104.76192           1  3.88         4.98     1        0     0
7            25.14            4         15.9                9.6 -102.93839           1  6.58         9.17     1        0     0
9            25.14            2         18.8                4.3 -104.23198           4  5.08         7.67     1        0     0
11           10.66            4         16.2               10.0  -98.23691          17 23.33        24.69     0        1     0
12           10.66            6         16.2               10.0 -103.78522           5 18.24        19.60     0        1     0
13           10.66            5         16.2               10.0  -94.59940           5 20.20        21.56     0        1     0
14           10.66            3         18.7                4.4 -103.17622           3 23.86        25.22     0        1     0
15           10.66            5         18.7                4.4 -104.97827           0 23.86        25.22     0        1     0
16           10.66            4         18.8                4.4 -105.78948           1 23.86        25.22     0        1     0
              Class HSUPA.Throughput Max.HSDPA.Users HS.DSCH.throughput Max.HSUPA.Users Avg.CQI
2  Not.Overshooting           222.62              16            2345.54              25   17.99
5      Overshooting           263.83               8            3894.07              13   21.82
7      Overshooting           392.66              14            5134.80              15   23.00
9      Overshooting           478.58               8            7203.39               8   24.70
11     Overshooting           173.21              11            2429.06              15   23.51
12     Overshooting           210.61              16            2694.93              20   19.76
13     Overshooting           205.81              11            3278.06              13   22.10
14     Overshooting           394.10              10            3881.88              13   25.01
15     Overshooting           371.71              10            3765.10              13   23.33
16     Overshooting           321.32               6            4422.15               8   24.85

Run Code Online (Sandbox Code Playgroud)

这是我的火车控制的代码:

#run the algorithms using 10 fold cross validation
set.seed(123)
train_Control <- trainControl(method = "repeatedCV", 
                              number = 10, 
                              repeats = 3,
                              savePredictions = T,
                              classProbs = T, #required for the ROC curve calcs
                              summaryFunction = twoClassSummary) #uses AUC to pick the best model

Run Code Online (Sandbox Code Playgroud)

这是我的火车功能:

 #uses the rose_training dataset with a kernel model
set.seed(123)
fit.rose.Kernel <- train(Class ~ Total.Tx.Height +
                         Antenna.Tilt +
                         Antenna.Gain +
                         Ant.Vert.Beamwidth +
                         RTWP +
                         Voice.Drops +
                         Range +
                         Max.Distance +
                         Rural +
                         Suburban +
                         Urban +
                         HSUPA.Throughput +
                         Max.HSDPA.Users +
                         HS.DSCH.throughput + 
                         Max.HSUPA.Users +
                         Avg.CQI, 
                       data = rose_train,
                       method = 'svmRadial',
                       preProcess = c('center','scale'),
                       trControl=train_Control,
                       tuneLength=15,
                       metric = "ROC")

Run Code Online (Sandbox Code Playgroud)

为了更好地理解代码的哪个部分导致问题,我清除了所有现有的警告并逐个运行每个模型以查看它在哪里标记.

最初我将第444到469行标记为有问题的部分,但今天这部分没有任何警告.现在,接下来的几行正在吐出与前一天相同的警告,但没有任何改变,期望清除警告.

总之,我有两种类型的模型,我试图比较,使用svmLinear的线性SVM和使用smvRadial的内核模型.

对于这两种模型,我使用不同的训练数据配置,因为我的原始数据集严重失衡为"过冲"(~80/20).我使用原始的不平衡数据,然后进行下采样,上采样,使用SMOTE和ROSE生成合成数据,以使用每种类型的训练集训练线性和内核模型.

有谁知道这些行搜索失败并且警告指的是什么？

为了提供可重现的示例,这里是指向我的代码副本的链接,这里是我正在使用的数据集的输出版本.导致这些消息和警告的代码部分从第444行开始.

如果有人能提供一些帮助,我将非常感激.

Answer 1

小智 0

我无法访问您的数据，但有一些建议：

检查您的数据是否有 NA。如果是这样，您可以使用 na.omit() 删除带有 NA 的行。
使用 createDataPartition() 将原始的不平衡数据划分为最佳的训练和测试集。

注意：为了避免人为错误，您的火车功能可以这样清理 -

fit.rose.Kernel <- train(Class ~ ., 
                       data = rose_train,
                       method = 'svmRadial',
                       preProcess = c('center','scale'),
                       trControl=train_Control,
                       tuneLength=15,
                       metric = "ROC")

Run Code Online (Sandbox Code Playgroud)

这也可能有助于解决问题。

归档时间：	8 年，9 月前
查看次数：	682 次
最近记录：	7 年，1 月前