神经网络:Sigmoid激活函数用于连续输出变量

use*_*372 5 matlab machine-learning neural-network

好的,所以我正处于Andrew Ng关于课程的机器学习课程的中间,并且想要调整作为任务4的一部分完成的神经网络.

特别是,我作为任务的一部分正确完成的神经网络如下:

  • Sigmoid激活功能: g(z) = 1/(1+e^(-z))
  • 10个输出单位,每个单位可以取0或1
  • 1个隐藏层
  • 用于最小化成本函数的反向传播方法
  • 成本函数:

-1/m sum ^ m_ {i = 1} sum ^ K_ {k = 1}(y_k _ {(i)})log((h_theta(x ^ {(i)} _ k)+(1-y_k ^ {( i)})log(1-h_theta(x ^ {(i)} _ k)+ lambda /(2*m)(sum_ {l = 1} ^ {L-1} sum_ {i = 1} ^ {s_l} sum_ {j = 1} ^ {s_ {l = 1}}(Theta_ {ji} ^ {(l)})^ {2}

其中L=number of layers,s_l = number of units in layer l,m = number of training examples,K = number of output units

现在我想调整练习,以便有一个连续的输出单元,在[0,1]之间取任何值,我正在尝试找出需要改变的东西,到目前为止我有

  • 用我自己的数据替换数据,即输出是0到1之间的连续变量
  • 更新了对输出单元数的引用
  • 将反向传播算法中的成本函数更新为: J = 1 /(2m)*sum ^ m_ {i = 1}(g(a_3)-y)^ 2 + lambda /(2*m)(sum_ {1 = 1} ^ {L-1} sum_ {i = 1} ^ {s_l} sum_ {j = 1} ^ {s_ {l = 1}}(Theta_ {ji} ^ {(l)})^ {2} 其中a_3是从前向传播确定的输出单位的值.

我确信其他必须改变,因为梯度检查方法显示由反向传播确定的梯度,并且数值近似不再匹配.我没有改变sigmoid梯度; 则留在f(z)*(1-f(z))其中f(z)是S形函数1/(1+e^(-z))),也没有我更新衍生物式的数值近似; 简单(J(theta+e) - J(theta-e))/(2e).

任何人都可以建议需要采取哪些其他步骤?

在Matlab中编码如下:

% FORWARD PROPAGATION
% input layer
a1 = [ones(m,1),X];
% hidden layer
z2 = a1*Theta1';
a2 = sigmoid(z2);
a2 = [ones(m,1),a2];
% output layer
z3 = a2*Theta2';
a3 = sigmoid(z3);

% BACKWARD PROPAGATION
delta3 = a3 - y;
delta2 = delta3*Theta2(:,2:end).*sigmoidGradient(z2);
Theta1_grad = (delta2'*a1)/m;
Theta2_grad = (delta3'*a2)/m;

% COST FUNCTION
J = 1/(2 * m) * sum( (a3-y).^2 );

% Implement regularization with the cost function and gradients.
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + Theta1(:,2:end)*lambda/m;
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + Theta2(:,2:end)*lambda/m;
J = J + lambda/(2*m)*( sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2)));
Run Code Online (Sandbox Code Playgroud)

我已经意识到这个问题类似于@Mikhail Erofeev在StackOverflow上提出的问题,但是在这种情况下我希望连续变量介于0和1之间,因此使用sigmoid函数.

len*_*310 2

首先,你的成本函数应该是:

J = 1/m * sum( (a3-y).^2 );
Run Code Online (Sandbox Code Playgroud)

我认为您的Theta2_grad = (delta3'*a2)/m;更改为 ) 后预计会匹配数值近似值delta3 = 1/2 * (a3 - y);

查看此幻灯片了解更多详细信息。

编辑: 如果我们的代码之间存在一些细微的差异,我将我的代码粘贴在下面供您参考。该代码已与数值逼近函数进行了比较checkNNGradients(lambda);,相对差异小于1e-4(但不满足1e-11Dr.Andrew Ng的要求)

function [J grad] = nnCostFunctionRegression(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)

Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

m = size(X, 1);   
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));


X = [ones(m, 1) X];   
z1 = sigmoid(X * Theta1');
zs = z1;
z1 = [ones(m, 1) z1];
z2 = z1 * Theta2';
ht = sigmoid(z2);


y_recode = zeros(length(y),num_labels);
for i=1:length(y)
    y_recode(i,y(i))=1;
end    
y = y_recode;


regularization=lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
J=1/(m)*sum(sum((ht - y).^2))+regularization;
delta_3 = 1/2*(ht - y);
delta_2 = delta_3 * Theta2(:,2:end) .* sigmoidGradient(X * Theta1');

delta_cap2 = delta_3' * z1; 
delta_cap1 = delta_2' * X;

Theta1_grad = ((1/m) * delta_cap1)+ ((lambda/m) * (Theta1));
Theta2_grad = ((1/m) * delta_cap2)+ ((lambda/m) * (Theta2));

Theta1_grad(:,1) = Theta1_grad(:,1)-((lambda/m) * (Theta1(:,1)));
Theta2_grad(:,1) = Theta2_grad(:,1)-((lambda/m) * (Theta2(:,1)));


grad = [Theta1_grad(:) ; Theta2_grad(:)];

end
Run Code Online (Sandbox Code Playgroud)