为什么神经网络在一个简单的分类案例中失败了

Kar*_*epu 7 r neural-network

我有下面的代码,其中形成了一个简单的基于规则的分类数据集:

# # Data preparation
data = data.frame(A = round(runif(100)), B = round(runif(100)), C = round(runif(100)))
# Y - is the classification output column
data$Y = ifelse((data$A == 1 & data$B == 1 & data$C == 0), 1, ifelse((data$A == 0 & data$B == 1 & data$C == 1), 1, ifelse((data$A == 0 & data$B ==0 & data$C == 0), 1, 0)))
# Shuffling the data set
data = data[sample(rownames(data)), ]
Run Code Online (Sandbox Code Playgroud)

我已将数据集划分为训练和测试,以便我可以在测试集上验证我的结果:

# # Divide into train and test
library(caret)
trainIndex = createDataPartition(data[, "Y"], p = .7, list = FALSE, times = 1) # for balanced sampling
train = data[trainIndex, ]
test = data[-trainIndex, ]
Run Code Online (Sandbox Code Playgroud)

我试图建立一个简单的神经网络,在隐藏层的神经元是由循环选择的数量(如所提到的在这里)

# # Build a neural net
library(neuralnet)
for(alpha in 2:10)
{
    nHidden = round(nrow(train)/(alpha*(3+1)))
    nn = neuralnet(Y ~ A + B + C, train, linear.output = F, likelihood = T, err.fct = "ce", hidden = nHidden)

    # Calculate Mean Squared Error for Train and Test
    trainMSE = mean((round(nn$net.result[[1]]) - train$Y)^2)
    testPred = round(compute(nn,test[-length(ncol(test))])$net.result)
    testMSE = mean((testPred - test$Y)^2)

    print(paste("Train Error: " , round(trainMSE, 4), ", Test Error: ", round(testMSE, 4), ", #. Hidden = ", nHidden, sep = ""))
}
Run Code Online (Sandbox Code Playgroud)

[1]"训练错误:0,测试错误:0.6,#.隐藏= 9"

[1]"训练错误:0,测试错误:0.6,#.隐藏= 6"

[1]"训练错误:0,测试错误:0.6,#.隐藏= 4"

[1]"训练错误:0,测试错误:0.6,#.隐藏= 4"

[1]"火车错误:0.1429,测试错误:0.8333,#.隐藏= 3"

[1]"训练错误:0.1429,测试错误:0.8333,#.隐藏= 2"

[1]"列车错误:0.0857,测试错误:0.6,#.隐藏= 2"

[1]"训练错误:0.1429,测试错误:0.8333,#.隐藏= 2"

[1]"列车错误:0.0857,测试错误:0.6,#.隐藏= 2"

这给了穷人超过合适的结果.但是,当我在同一数据集上构建一个简单的随机森林时.我得到的火车和测试错误为 - 0

# # Build a Random Forest
trainRF = train
trainRF$Y = as.factor(trainRF$Y)
testRF = test

library(randomForest)
rf = randomForest(Y ~ ., data = trainRF, mtry = 2)

# Calculate Mean Squared Error for Train and Test
trainMSE = mean((round(rf$votes[,2]) - as.numeric(as.character(trainRF$Y)))^2)
testMSE = mean((round(predict(rf, testRF, type = "prob")[,2]) - as.numeric(as.character(testRF$Y)))^2)

print(paste("Train Error: " , round(trainMSE, 4), ", Test Error: ", round(testMSE, 4), sep = ""))
Run Code Online (Sandbox Code Playgroud)

[1]"训练错误:0,测试错误:0"

请帮助我理解为什么神经网络在一个简单的情况下失败,其中随机森林正在以100%的准确度工作.

注意:我只使用了一个隐藏层(假设一个隐藏层足以进行这种简单的分类)并迭代隐藏层中的神经元数量.

另外,如果我对神经网络参数的理解是错误的,请帮助我.

完整的代码可以在这里找到

seb*_*nmm 1

一段时间以来,类似的问题一直困扰着我,所以我尝试理解你的数据和问题,并将它们与我的进行比较。但最终,这只是这一行中的一个小错误:

testPred = round(compute(nn,test[-length(ncol(test))])$net.result)
Run Code Online (Sandbox Code Playgroud)

您选择BCY进行预测,而不是ABC,因为length(ncol(something))它始终返回 1。您只需要test[-ncol(test)]

> summary(test[-length(ncol(test))])

          B              C             Y            
 Min.   :0.00   Min.   :0.0   Min.   :0.0000000  
 1st Qu.:0.00   1st Qu.:0.0   1st Qu.:0.0000000  
 Median :0.00   Median :0.5   Median :0.0000000  
 Mean   :0.48   Mean   :0.5   Mean   :0.3766667  
 3rd Qu.:1.00   3rd Qu.:1.0   3rd Qu.:1.0000000  
 Max.   :1.00   Max.   :1.0   Max.   :1.0000000  
Run Code Online (Sandbox Code Playgroud)