SVM的损失函数的梯度

Dav*_*vid 8 python svm computer-vision linear-regression gradient-descent

我在卷积神经网络上研究这个课程.我一直在尝试为svm实现一个损失函数的梯度,并且(我有一个解决方案的副本)我无法理解为什么解决方案是正确的.

页面上,它定义了损失函数的梯度,如下所示: cs231n的课程笔记 在我的代码中,我的分析梯度在代码中实现时与数字梯度匹配,如下所示:

 dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      if j == y[i]:
        if margin > 0:
            continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        dW[:, y[i]] += -X[i]
        dW[:, j] += X[i] # gradient update for incorrect rows
        loss += margin
Run Code Online (Sandbox Code Playgroud)

然而,从笔记中看来,dW[:, y[i]]每次都应该改变,j == y[i]因为我们每次减去损失j == y[i].我很困惑为什么代码不是:

  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      if j == y[i]:
        if margin > 0:
            dW[:, y[i]] += -X[i]
            continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        dW[:, j] += X[i] # gradient update for incorrect rows
        loss += margin
Run Code Online (Sandbox Code Playgroud)

当损失会发生变化j == y[i].为什么他们都在计算时J != y[i]

exp*_*rer 5

我没有足够的声誉发表评论,所以我在这里回答.当你计算的丢失向量x[i],i次训练例子,并得到一些非零损失,这意味着你应该将你的权重向量为不正确的类(j != y[i])路程x[i],并在同一时间,移动的权重或超平面正确的类(j==y[i]附近)x[i].由平行四边形法则,w + x而是介于两者之间wx.因此w[y[i]],x[i]每次发现时,这种方式都会越来越近loss>0.

因此,dW[:,y[i]] += -X[i]并且dW[:,j] += X[i]在循环中完成,但是在更新时,我们将在减小梯度的方向上进行,因此我们基本上添加X[i]到正确的类权重并且X[i]从错过分类的权重中消失.