Dav*_*vid 8 python svm computer-vision linear-regression gradient-descent
我在卷积神经网络上研究这个课程.我一直在尝试为svm实现一个损失函数的梯度,并且(我有一个解决方案的副本)我无法理解为什么解决方案是正确的.
在此页面上,它定义了损失函数的梯度,如下所示:
在我的代码中,我的分析梯度在代码中实现时与数字梯度匹配,如下所示:
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
if j == y[i]:
if margin > 0:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
dW[:, y[i]] += -X[i]
dW[:, j] += X[i] # gradient update for incorrect rows
loss += margin
Run Code Online (Sandbox Code Playgroud)
然而,从笔记中看来,dW[:, y[i]]
每次都应该改变,j == y[i]
因为我们每次减去损失j == y[i]
.我很困惑为什么代码不是:
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
if j == y[i]:
if margin > 0:
dW[:, y[i]] += -X[i]
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
dW[:, j] += X[i] # gradient update for incorrect rows
loss += margin
Run Code Online (Sandbox Code Playgroud)
当损失会发生变化j == y[i]
.为什么他们都在计算时J != y[i]
?
我没有足够的声誉发表评论,所以我在这里回答.当你计算的丢失向量x[i]
,i
次训练例子,并得到一些非零损失,这意味着你应该将你的权重向量为不正确的类(j != y[i])
路程x[i]
,并在同一时间,移动的权重或超平面正确的类(j==y[i]
附近)x[i]
.由平行四边形法则,w + x
而是介于两者之间w
和x
.因此w[y[i]]
,x[i]
每次发现时,这种方式都会越来越近loss>0
.
因此,dW[:,y[i]] += -X[i]
并且dW[:,j] += X[i]
在循环中完成,但是在更新时,我们将在减小梯度的方向上进行,因此我们基本上添加X[i]
到正确的类权重并且X[i]
从错过分类的权重中消失.