sac*_*ruk 6 python gradient-descent pytorch
我想在向后传递之前累积梯度。所以想知道正确的做法是什么。根据这篇文章, 它是:
model.zero_grad() # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
predictions = model(inputs) # Forward pass
loss = loss_function(predictions, labels) # Compute loss function
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
optimizer.step() # Now we can do an optimizer step
model.zero_grad()
Run Code Online (Sandbox Code Playgroud)
而我预计它是:
model.zero_grad() # Reset gradients tensors
loss = 0
for i, (inputs, labels) in enumerate(training_set):
predictions = model(inputs) # Forward pass
loss += loss_function(predictions, labels) # Compute loss function
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
optimizer.step() # Now we can do an optimizer step
model.zero_grad()
loss = 0
Run Code Online (Sandbox Code Playgroud)
我累积损失,然后除以累积步骤以求平均。
第二个问题,如果我是对的,考虑到我只在每个累积步骤中进行反向传递,您是否希望我的方法更快?
| 归档时间: |
|
| 查看次数: |
2048 次 |
| 最近记录: |