反向传播 - 梯度误差 [Python]

kyt*_*car 5 python gradient machine-learning deep-learning

我正在学习 Andrew Ng 新的深度学习 Coursera 课程,第 2 周。

我们应该实现逻辑回归算法。
我被困在渐变代码 ( dw) - 给我一个语法错误。

算法如下:

import numpy as np

def propagate(w, b, X, Y):
    m = X.shape[1]

    A = sigmoid(np.dot(w.T,X) + b )  # compute activation
    cost = -(1/m)*(np.sum(np.multiply(Y,np.log(A)) + np.multiply((1-Y),np.log(1-A)), axis=1)    

    dw =(1/m)*np.dot(X,(A-Y).T)
    db = (1/m)*(np.sum(A-Y))
    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())

    grads = {"dw": dw,
             "db": db}

    return grads, cost
Run Code Online (Sandbox Code Playgroud)

为什么我不断收到此语法错误的任何想法?

File "<ipython-input-1-d104f7763626>", line 32
    dw =(1/m)*np.dot(X,(A-Y).T)
     ^
SyntaxError: invalid syntax
Run Code Online (Sandbox Code Playgroud)

use*_*197 5

Andrew NG 很有启发性,这是毫无疑问的,
但是为了更好的代码设计,可以采取一些步骤:

提示 1
如果认真维护更大的代码库,开始使用一些更好的 IDE,其中
(a)括号匹配与 GUI 突出显示,和
(b)跳转到匹配括号 KBD 快捷方式

    cost = - ( 1 / m ) * ( np.sum(   np.multiply(       Y,   np.log(     A ) )
                                   + np.multiply( ( 1 - Y ), np.log( 1 - A ) ),
                                   axis = 1
                                   )
                           ) # missing-parenthesis
Run Code Online (Sandbox Code Playgroud)

提示2:
经过刚才的所有课程任务了自动评分,尽量提高你的代码的性能-不是所有的步骤都进行了优化,这是宽容的小规模的学习任务,而这一次规模较大的可能会杀死你的方法NO( N^k )中两个[PTIME,PSPACE]维度。

似乎对 1E+3 来说足够工作的东西,却无法为 1E+6 或 1E+9 示例提供训练,如果某些 ML 管道在 ML 模型的 HyperPARAMETERs[EXPTIME,EXPSPACE]搜索域上迭代,则效果会更少。那很痛。然后,人们开始更仔细地编写代码以进行权衡,一旦问题大小不适合 RAM 中的计算基础设施处理,就会付出过多的代价[PTIME]和代价 。[EXPTIME][PSPACE]

在哪里?

-- 避免对同一事物进行重复计算,如果数组被考虑的越多
(在所有迭代方法中,ML-pipelines + ML-model-HyperPARAMETERs 越多,实际上 VAST SPACE-搜索
每个浪费 [ns] 很快增长到累积 [us],如果不是 [ms],
每个浪费的 [ms] 很快就会增长到累积的 [s],如果不是几十 [min],
每个浪费的 [min] 很快就会增长到累积的 [hrs],如果不是 [天] ...是的,可以在糟糕的代码设计中度过几天)

例子:

# here, A[] is .ALLOC'd + .SET -----------------------------[PTIME]
A = sigmoid( np.dot( w.T, X )
           + b
             )  # compute activations, .SET in A[]
# ----------------------------------------------------------[PTIME]-cost was paid
cost = -( 1 / m ) * ( np.sum(   np.multiply(      Y,   np.log(     A ) )
                              + np.multiply( (1 - Y ), np.log( 1 - A ) ),
                                axis = 1
                                )
                      )
# ----------------------------------------------------------[PTIME]-cost again?
dw =  ( 1 / m ) *   np.dot( X, ( A - Y ).T )    # consumes ( A - Y )
db =  ( 1 / m ) * ( np.sum(      A - Y )   )    # consumes ( A - Y ) again
# ----------------------------------------------# last but not least,
#                                               # A[] is not consumed
#                                               #     till EoFun/return
# a better approach is to use powerful + faster [PTIME] numpy in-place operations
# that also avoid additional dynamic allocation [PSPACE] -> saving more [PTIME]
DIV_byM = 1 / m                                 # re-use O(N^2) times
A      -= Y                                     # way faster in-place + re-used
# ----------------------------------------------# [PTIME]-cost avoided 2x
dw      = np.dot( X, A.T )                      #        +1st re-use
dw     *= DIV_byM                               # way faster in-place

assert( dw.shape == w.shape and "INF: a schoolbook assert()-ion test, "
                            and "of not much value in PRODUCTION-code"
                            )
return { 'dw': dw,                              
         'db': DIV_byM * np.sum( A )            #        +2nd re-use
          }                                     # MUCH better to design
#                                               # the whole as in-place mods
#                                               # of static .ALLOC'd np.view-s,
#                                               # instead of new dict()-s
Run Code Online (Sandbox Code Playgroud)

[TEST-ME]是一个很好的设计实践,但
[PERF-ME]缩放对于成熟的代码更重要 - 一个很好的评估实践:

一个好的工程实践是根据一些现实的操作状态/条件对自己的代码进行基准测试。

鉴于使用了,人们可以假设一组缩放范围——一个 ~ 20 M 个神经元,一个 ~ 30 M 个神经元——来基准和自我记录代码执行时间:

        """                                                            __doc__
        USAGE:      ...
        PARAMETERS: ...
        ...
        EXAMPLE:    nnFeedFORWARD( X_example, nnMAP, thetaVEC, stateOfZ, stateOfA )

        [TEST-ME]   ...
        [PERF-ME]   *DO NOT* numba.jit( nnFeedFORWARD, nogil = True ) as it performs worse than with plain numpy-OPs

                        ~ 500 .. 1200 [us / 1E6 theta-s .dot() ] on pre-prepared np.view()-s
                        ~ 500 .. 1200 [us / 1E6 theta-s *= 0.  ] on pre-prepared np.view()-s
            ############################################################
            #
            # as-is:    ~   9 [ms / 21M Theta-s .dot() ] on pre-prepared np.view()-s for MAT + INCL. np.random/rand( 1000 ) ~~ 40 [us]
                              [  /  10k NEURONs tanh() ] in  5 LAYERs
                        ~  14 [ms / 30M Theta-s .dot() ]
                              [  /  17k NEURONs tanh() ] in 10 LAYERs
                        >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * ~ 1E6 iterations in { .minimize() | .fmin_l_bfgs_b() }
                        ~   4 [hrs / 1E6 iterations ] w/o backprop
Run Code Online (Sandbox Code Playgroud)


jde*_*esa 4

在该行中cost = ...,您在末尾缺少一个括号,或者只是删除后面的括号*

# ...
cost = -(1/m)*np.sum(np.multiply(Y,np.log(A)) + np.multiply((1-Y),np.log(1-A)), axis=1)
# ...
Run Code Online (Sandbox Code Playgroud)