损失函数和深度学习

Question

损失函数和深度学习

blu*_*sky 2 machine-learning neural-network deep-learning loss-function

来自 deeplearning.ai ：

\n\n

\n
构建神经网络的一般方法是：
\n\n
\n
定义神经网络结构（输入单元数、隐藏单元数等）。
\n
初始化模型参数
\n
循环:\n\n
\n
实现前向传播
\n
计算损失
\n
实现反向传播以获得梯度
\n
更新参数（梯度下降）
\n
\n
\n

\n\n

损失函数如何影响网络的学习方式？

\n\n

例如，这是我对前向和反向传播的实现，我认为它是正确的，因为我可以使用以下代码训练模型以获得可接受的结果：

\n\n\n\n

for i in range(number_iterations):\n\n\n  # forward propagation\n\n\n    Z1 = np.dot(weight_layer_1, xtrain) + bias_1\n    a_1 = sigmoid(Z1)\n\n    Z2 = np.dot(weight_layer_2, a_1) + bias_2\n    a_2 = sigmoid(Z2)\n\n    mse_cost = np.sum(cost_all_examples)\n    cost_cross_entropy = -(1.0/len(X_train) * (np.dot(np.log(a_2), Y_train.T) + np.dot(np.log(1-a_2), (1-Y_train).T)))\n\n#     Back propagation and gradient descent\n    d_Z2 = np.multiply((a_2 - xtrain), d_sigmoid(a_2))\n    d_weight_2 = np.dot(d_Z2, a_1.T)\n    d_bias_2 = np.asarray(list(map(lambda x : [sum(x)] , d_Z2)))\n    #   perform a parameter update in the negative gradient direction to decrease the loss\n    weight_layer_2 = weight_layer_2 + np.multiply(- learning_rate , d_weight_2)\n    bias_2 = bias_2 + np.multiply(- learning_rate , d_bias_2)\n\n    d_a_1 = np.dot(weight_layer_2.T, d_Z2)\n    d_Z1 = np.multiply(d_a_1, d_sigmoid(a_1))\n    d_weight_1 = np.dot(d_Z1, xtrain.T)\n    d_bias_1 = np.asarray(list(map(lambda x : [sum(x)] , d_Z1)))\n    weight_layer_1 = weight_layer_1 + np.multiply(- learning_rate , d_weight_1)\n    bias_1 = bias_1 + np.multiply(- learning_rate , d_bias_1)\n

Run Code Online (Sandbox Code Playgroud)\n\n

注意以下几行：

\n\n

mse_cost = np.sum(cost_all_examples)\ncost_cross_entropy = -(1.0/len(X_train) * (np.dot(np.log(a_2), Y_train.T) + np.dot(np.log(1-a_2), (1-Y_train).T)))\n

Run Code Online (Sandbox Code Playgroud)\n\n

我可以使用 mse 损失或交叉熵损失来了解系统的学习情况。但这仅供参考，成本函数的选择不会影响网络的学习方式。我相信我不理解深度学习文献中经常提到的一些基本知识，即损失函数的选择是深度学习的重要一步？但如上面的代码所示，我可以选择交叉熵或 mse 损失，并且不会影响网络的学习方式，交叉熵或 mse 损失仅供参考？

\n\n

更新：

\n\n

例如，下面是来自 deeplearning.ai 的一段计算成本的代码片段：

\n\n

# GRADED FUNCTION: compute_cost\n\ndef compute_cost(A2, Y, parameters):\n    """\n    Computes the cross-entropy cost given in equation (13)\n\n    Arguments:\n    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)\n    Y -- "true" labels vector of shape (1, number of examples)\n    parameters -- python dictionary containing your parameters W1, b1, W2 and b2\n\n    Returns:\n    cost -- cross-entropy cost given equation (13)\n    """\n\n    m = Y.shape[1] # number of example\n\n    # Retrieve W1 and W2 from parameters\n    ### START CODE HERE ### (\xe2\x89\x88 2 lines of code)\n    W1 = parameters[\'W1\']\n    W2 = parameters[\'W2\']\n    ### END CODE HERE ###\n\n    # Compute the cross-entropy cost\n    ### START CODE HERE ### (\xe2\x89\x88 2 lines of code)\n    logprobs = np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2))\n    cost = - np.sum(logprobs) / m\n    ### END CODE HERE ###\n\n    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect. \n                                # E.g., turns [[17]] into 17 \n    assert(isinstance(cost, float))\n\n    return cost\n

Run Code Online (Sandbox Code Playgroud)\n\n

该代码按预期运行并实现了高精度/低成本。除了向机器学习工程师提供有关网络学习情况的信息之外，在此实现中不使用成本值。这让我质疑成本函数的选择如何影响神经网络的学习方式？

\n