R中使用'neuralnet'时出现意外输出

Str*_*keR 2 r neural-network

我正在使用neuralnetR来预测手写数字.MNIST数据库用于训练和测试该算法.这是R我使用的代码:

# Importing the data into R
path <- "path_to_data_folder/MNIST_database_of_handwritten_digits/"  # Data can be downloaded from: http://yann.lecun.com/exdb/mnist/
to.read = file(paste0(path, "train-images-idx3-ubyte"), "rb")
to.read_Label = file(paste0(path, "train-labels-idx1-ubyte"), "rb")
magicNumber <- readBin(to.read, integer(), n=1, endian="big")
magicNumber_Label <- readBin(to.read_Label, integer(), n=1, endian="big")
numberOfImages <- readBin(to.read, integer(), n=1, endian="big")
numberOfImages_Label <- readBin(to.read_Label, integer(), n=1, endian="big")
rowPixels <- readBin(to.read, integer(), n=1, endian="big")
columnPixels <- readBin(to.read, integer(), n=1, endian="big")

# image(1:rowPixels, 1:columnPixels, matrix(readBin(to.read, integer(), n=(rowPixels*columnPixels), size=1, endian="big"), rowPixels, columnPixels)[,columnPixels:1], col=gray((0:255)/255))

trainDigits <- NULL
trainDigits <- vector(mode="list", length=numberOfImages)
for(i in 1:numberOfImages)
  trainDigits[[i]] <- as.vector(matrix(readBin(to.read, integer(), n=(rowPixels*columnPixels), size=1, endian="big"), rowPixels, columnPixels)[,columnPixels:1])

trainDigits <- t(data.frame(trainDigits))  # Takes a minute
trainDigits <- data.frame(trainDigits, row.names=NULL)

# i <- 1  # Specify the image number to visualize the image
# image(1:rowPixels, 1:columnPixels, matrix(trainDigits[i,], rowPixels, columnPixels), col=gray((0:255)/255))

trainDigits_Label <- NULL
for(i in 1:numberOfImages_Label)
  trainDigits_Label <- c(trainDigits_Label, readBin(to.read_Label, integer(), n=1, size=1, endian="big"))

# appending the labels to the training data
trainDigits <- cbind(trainDigits, trainDigits_Label)

#################### Modelling ####################

library(neuralnet)
# Considering only 500 rows for training due to time and memory constraints
myNnet <- neuralnet(formula = as.formula(paste0("trainDigits_Label ~ ", paste0("X",1:(ncol(trainDigits)-1), collapse="+"))),
                                data = trainDigits[1:500,], hidden = 10, algorithm='rprop+', learningrate=0.01)

#################### Test Data ####################

to.read_test = file(paste0(path, "t10k-images-idx3-ubyte"), "rb")
to.read_Label_test = file(paste0(path, "t10k-labels-idx1-ubyte"), "rb")
magicNumber <- readBin(to.read_test, integer(), n=1, endian="big")
magicNumber_Label <- readBin(to.read_Label_test, integer(), n=1, endian="big")
numberOfImages_test <- readBin(to.read_test, integer(), n=1, endian="big")
numberOfImages_Label_test <- readBin(to.read_Label_test, integer(), n=1, endian="big")
rowPixels <- readBin(to.read_test, integer(), n=1, endian="big")
columnPixels <- readBin(to.read_test, integer(), n=1, endian="big")

testDigits <- NULL
testDigits <- vector(mode="list", length=numberOfImages_test)
for(i in 1:numberOfImages_test)
  testDigits[[i]] <- as.vector(matrix(readBin(to.read_test, integer(), n=(rowPixels*columnPixels), size=1, endian="big"), rowPixels, columnPixels)[,columnPixels:1])

testDigits <- t(data.frame(testDigits))  # Takes a minute
testDigits <- data.frame(testDigits, row.names=NULL)

testDigits_Label <- NULL
for(i in 1:numberOfImages_Label_test)
  testDigits_Label <- c(testDigits_Label, readBin(to.read_Label_test, integer(), n=1, size=1, endian="big"))

#################### 'neuralnet' Predictions ####################

predictOut <- compute(myNnet, testDigits)
table(round(predictOut$net.result), testDigits_Label)

#################### Random Forest ####################
# Cross-validating NN results with Random Forest

library(randomForest)
myRF <- randomForest(x=trainDigits[,-ncol(trainDigits)], y=as.factor(trainDigits_Label), ntree=100)

predRF <- predict(myRF, newdata=testDigits)
table(predRF, testDigits_Label)  # Confusion Matrix
sum(diag(table(predRF, testDigits_Label)))/sum(table(predRF, testDigits_Label))  # % of correct predictions
Run Code Online (Sandbox Code Playgroud)

有60,000个训练图像(28*28像素图像),数字0到9在整个数据集中(几乎)均匀分布.与上面仅使用500个图像的"建模"部分不同,我使用整个训练数据集来训练myNnet模型(28*28 = 784个输入和10个输出),然后预测测试数据集中10,000个图像的输出.(由于内存限制,我在隐藏层中只使用了10个神经元.)

我用预测得到的结果很奇怪:输出是一种高斯分布,其中大部分时间预测4,而4的预测0或9减少(种类)指数.您可以在下面看到混淆矩阵(我将输出四舍五入,因为它们不是整数):

> table(round(predictOut$net.result), testDigits_Label)
    testDigits_Label
       0   1   2   3   4   5   6   7   8   9
  -2   1   1   4   1   1   3   0   4   1   2
  -1   8  17  12   9   7   8   8  12   7  10
  0   38  50  44  45  35  28  36  40  30  39
  1   77 105  86  80  71  69  68  75  67  77
  2  116 163 126 129 101  97 111 101  99 117
  3  159 205 196 174 142 140 153 159 168 130
  4  216 223 212 183 178 170 177 169 181 196
  5  159 188 150 183 183 157 174 176 172 155
  6  119 111 129 125 143 124 144 147 129 149
  7   59  53  52  60  74  52  51  91  76  77
  8   22  14  18  14  32  36  28  38  35  41
  9    6   5   3   7  15   8   8  16   9  16
Run Code Online (Sandbox Code Playgroud)

我认为我的方法肯定有问题,所以我尝试使用randomForest包的预测R.但是,randomForest工作得很好,准确度超过95%.这是randomForest预测的混淆矩阵:

> table(predRF, testDigits_Label)
      testDigits_Label
predRF    0    1    2    3    4    5    6    7    8    9
     0  967    0    6    1    1    7   11    2    5    5
     1    0 1123    0    0    0    1    3    7    0    5
     2    1    2  974    9    3    1    3   25    4    2
     3    0    3    5  963    0   21    0    0    9   10
     4    0    0   12    0  940    1    4    2    7   15
     5    4    0    2   16    0  832    6    0   11    4
     6    6    5    5    0    7   11  929    0    3    2
     7    1    1   14    7    2    2    0  979    4    6
     8    1    1   12    7    5   11    2    1  917   10
     9    0    0    2    7   24    5    0   12   14  950
Run Code Online (Sandbox Code Playgroud)
  • 问题1:那么,任何人都可以解释一下为什么neuralnet这个数据集有这种奇怪的行为?(顺便说一句,当我检查时neuralnet,iris数据集正常工作).

    • 编辑: 我想我理解neuralnet使用时输出中高斯分布的原因.当neuralnet使用时,每个输出类(这里是10个类)只有一个输出节点(或者它是神经元?)而不是节点.因此,在计算反向传播的增量时,算法会计算"预期输出"与"计算输出"之间的差异,对于所有实例的聚合,对于输出为4或5的实例,这些差异最小.错误因此,在反向传播期间将调整权重,使得输出误差最小化.这可能是由于给出高斯类输出的原因neuralnet.
  • 问题2:我也想知道如何纠正这种行为neuralnet并获得与randomForest结果相同的预测.

nog*_*pes 10

一些初步建议,您可以更有效地加载您的数据:

# Read in data.
trainDigits <- replicate(numberOfImages,c(matrix(readBin(to.read, integer(), n=(rowPixels*columnPixels), size=1, endian="big"),rowPixels,columnPixels)[,columnPixels:1]))
trainDigits <- data.frame(t(trainDigits),row.names=NULL)
trainDigits_Label<-replicate(numberOfImages,readBin(to.read_Label, integer(), n=1, size=1, endian="big"))
Run Code Online (Sandbox Code Playgroud)

您的第一个问题是您尚未指定多类预测neuralnet.你正在做的是预测一个实数,从0到9.这就是为什么只有一个输出,而不是10个预测.

如果你看一下?neuralnet有一个多类预测的例子; 你必须将每个类放在一个单独的变量中,并将它放在左侧formula.其他软件包nnet会自动检测factor并为您执行此操作.您可以使用该classInd函数将因子拆分为多个变量:

# appending the labels to the training data
output <- class.ind(trainDigits_Label)
colnames(output)<-paste0('out.',colnames(output))
output.names<-colnames(output)
input.names<-colnames(trainDigits)
trainDigits<-cbind(output,trainDigits)
Run Code Online (Sandbox Code Playgroud)

现在你可以粘贴一个公式:

# Considering only 500 rows
trainsize=500
# neuralnet:::varify.variables (sic) does not pass "data" when calling "terms".
# If it did, you wouldn't have to construct the formula like this.
library(neuralnet)
myNnet <- neuralnet(formula = paste(paste(output.names,collapse='+'),'~',
                              paste(input.names,collapse='+')),
                    data = trainDigits[1:trainsize,],
                    hidden = 10, 
                    algorithm='rprop+', 
                    learningrate=0.01,
                    rep=1)
Run Code Online (Sandbox Code Playgroud)

校正仍然不能使神经网络表现良好.要了解神经网络有多糟糕,请查看它对训练数据的执行情况.它应该是相当不错的,因为它之前已经看到了所有这些数据:

# Accuracy on training data
res<-compute(myNnet,trainDigits[1:trainsize,input.names])
picks<-(0:9)[apply(res$net.result,1,which.max)]
prop.table(table(trainDigits_Label[1:trainsize] == picks))
# FALSE  TRUE 
# 0.376 0.624 
Run Code Online (Sandbox Code Playgroud)

训练数据的准确率为62%.正如您所料,它在其余数据上几乎不会随机执行:

# Accuracy on test data
res<-compute(myNnet,trainDigits[(trainsize+1):60000,input.names])
picks<-(0:9)[apply(res$net.result,1,which.max)]
prop.table(table(trainDigits_Label[(trainsize+1):60000] == picks))
# FALSE         TRUE 
# 0.8612268908 0.1387731092 
# 14% accuracy
Run Code Online (Sandbox Code Playgroud)

随机森林使用完全相同的数据做得非常好.它最近变得如此受欢迎有一个很好的理由.

trainsize=500
library(randomForest)
myRF <- randomForest(trainDigits_Label~.,
                     data=data.frame(trainDigits_Label=as.factor(trainDigits_Label),
                                     trainDigits[input.names])[1:trainsize,],
                     ntree=100)

# Train
p <- as.numeric(as.character(predict(myRF)))
prop.table(table(trainDigits_Label[1:trainsize]==p))
# Accuracy: 79%    

# Test
p <- as.numeric(as.character(predict(myRF,trainDigits[(trainsize+1):60000,])))
prop.table(table(trainDigits_Label[(trainsize+1):60000]==p))
# Accuracy: 76%
Run Code Online (Sandbox Code Playgroud)

所以,对于你的第二个问题,我的反问题是:为什么你会期望神经网络和随机森林一样?它们可能有一些模糊的结构相似性,但拟合过程却截然不同.我想你可以对神经网络中的节点进行挖掘,并将它们与随机森林模型中最重要的变量进行比较.但是,在这一点上,它更像是一个统计问题而不是编程问题.

  • 我同意神经网络是*灵活的*,这可能允许他们识别其他机器看不到的模式.但这并不意味着它们更强大*; 灵活性使它们更难以适应,更容易陷入局部最小值,更容易过度训练数据(如本例所示). (2认同)