如何从交叉验证中产生混淆矩阵？

Question

如何从交叉验证中产生混淆矩阵？

DN1*_*DN1 1 r machine-learning lda cross-validation

我是R和机器学习的新手,我正在使用2个类的数据.我正在尝试进行交叉验证,但是当我尝试制作模型的混淆矩阵时,我得到一个错误,即所有参数必须具有相同的长度.我无法理解为什么我输入的内容长度不一样.任何正确方向的帮助将不胜感激.

library(MASS)
xCV = x[sample(nrow(x)),]

folds <- cut(seq(1,nrow(xCV)),breaks=10,labels=FALSE)

for(i in 1:10){

  testIndexes = which(folds==i,arr.ind=TRUE)
  testData = xCV[testIndexes, ]
  trainData = xCV[-testIndexes, ]

}
ldamodel = lda(class ~ ., trainData)
lda.predCV = predict(model)

conf.LDA.CV=table(trainData$class, lda.predCV$class)
print(conf.LDA.CV)

Run Code Online (Sandbox Code Playgroud)

Answer 1

mis*_*use 5

你的代码的问题是你没有在循环中进行建模和预测,你只需要生成一个testIndexes,i == 10因为你覆盖了所有其他代码.

以下代码将在iris数据上完成:

library(MASS)
data(iris)

Run Code Online (Sandbox Code Playgroud)

生成折叠:

set.seed(1)
folds <- sample(1:10, size = nrow(irisCV), replace = T) #5 fold CV
table(folds)
#output
folds
 1  2  3  4  5  6  7  8  9 10 
10 12 17 16 21 13 17 20 12 12

Run Code Online (Sandbox Code Playgroud)

或者如果你想要相同大小的折叠:

set.seed(1)
folds <- sample(rep(1:10, length.out = nrow(irisCV)), size = nrow(irisCV), replace = F)
table(folds)
#output
folds
 1  2  3  4  5  6  7  8  9 10 
15 15 15 15 15 15 15 15 15 15

Run Code Online (Sandbox Code Playgroud)

通过将模型设置为9折并预测保持来运行模型:

CV_lda <- lapply(1:10, function(x){ 
  model <- lda(Species ~ ., iris[folds != x, ])
  preds <- predict(model,  iris[folds == x,], type="response")$class
  return(data.frame(preds, real = iris$Species[folds == x]))
})

Run Code Online (Sandbox Code Playgroud)

这会生成一个保持预测列表,将其组合到数据框中:

CV_lda <- do.call(rbind, CV_lda)

Run Code Online (Sandbox Code Playgroud)

产生混淆矩阵:

library(caret)

confusionMatrix(CV_lda$preds, CV_lda$real)
#output
Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         48         1
  virginica       0          2        49

Overall Statistics

               Accuracy : 0.98            
                 95% CI : (0.9427, 0.9959)
    No Information Rate : 0.3333          
    P-Value [Acc > NIR] : < 2.2e-16       

                  Kappa : 0.97            
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            0.9600           0.9800
Specificity                 1.0000            0.9900           0.9800
Pos Pred Value              1.0000            0.9796           0.9608
Neg Pred Value              1.0000            0.9802           0.9899
Prevalence                  0.3333            0.3333           0.3333
Detection Rate              0.3333            0.3200           0.3267
Detection Prevalence        0.3333            0.3267           0.3400
Balanced Accuracy           1.0000            0.9750           0.9800

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，8 月前
查看次数：	2162 次
最近记录：	7 年，8 月前