FFT*_*FFT 13 gradient-descent deep-learning tensorflow pytorch
我在 PyTorch 中训练模型的程序的收敛性比 TensorFlow 实现的要差。当我改用 SGD 而不是 Adam 时,损失是相同的。对于 Adam,损失从第一个时期开始就不同了。我相信我在两个程序中使用相同的设置。关于如何调试这个的任何想法都会有帮助!
使用 SGD 计算的损失
火炬
0.1504615843296051
0.10858417302370071
0.08603279292583466
Run Code Online (Sandbox Code Playgroud)
TensorFlow
0.15046157
0.108584
0.08603277
Run Code Online (Sandbox Code Playgroud)
使用 Adam 的损失
火炬
0.0031117501202970743
0.0020642257295548916
0.0019268309697508812
0.0016333406092599034
0.0017334128497168422
0.0014430736191570759
0.0010424457723274827
0.0012145100627094507
0.0011195113183930516
0.0009501167223788798
0.0009987876983359456
0.0007953296881169081
0.00075263757025823
0.0008374055614694953
0.000735406531020999
Run Code Online (Sandbox Code Playgroud)
张力流:
0.0036667113
0.0032563617
0.0021536187
0.0015266595
0.0013580231
0.0013878695
0.0011856346
0.0011136091
0.00091276
0.000890126
0.00088381825
0.0007283067
0.00081382995
0.0006670901
0.00046282331
Run Code Online (Sandbox Code Playgroud)
Adam 优化器设置
TF 1.15.3:
adam_optimizer = tf.train.AdamOptimizer(learning_rate=5e-5)
# default parameters from the documentation at https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/python/training/adam.py#L32-L235:
# learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8, use_locking=False, name="Adam")
Run Code Online (Sandbox Code Playgroud)
火炬
torch.optim.Adam(params=model.parameters(), lr=5e-5, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0)
Run Code Online (Sandbox Code Playgroud)
训练
事前调试
保存和加载 PyTorch 模型
def train(...):
...
checkpoint = torch.load(checkpoint_file, map_location=device)
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
...
counter = 0
while run:
counter += 1
if counter > 1000:
break
in = np.load("debug_data/in.npy")
out1 = np.load("debug_data/out1.npy")
out2 = np.load("debug_data/out2.npy")
# adjust from TF
in = in.squeeze(3)
in = np.expand_dims(in, axis=0)
... do the same for out1 and out2
in, out1, out2 = \
torch.from_numpy(in).to(device), \
torch.from_numpy(out1).to(device), \
torch.from_numpy(out2).to(device)
optimizer.zero_grad()
out1_hat, out2_hat = model(in)
train_loss = loss_fn(out1_hat, out1) + loss_fn(out2_hat, out2)
train_loss.backward()
optimizer.step()
save_checkpoint({'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict()},
latest_filename=latest_checkpoint_path)
Run Code Online (Sandbox Code Playgroud)
保存和加载 TensorFlow 模型
sess.run(tf.global_variables_initializer())
writer = tf.summary.FileWriter(my_path, graph=sess.graph)
restorer = tf.train.Saver(tf.global_variables(), write_version=tf.train.SaverDef.V2)
restorer.restore(sess, load_path)
saver = tf.train.Saver(tf.global_variables(), write_version=tf.train.SaverDef.V2)
counter = 0
while run:
counter += 1
if counter > 1000:
break
in = np.load("")
out1 = np.load("")
out2 = np.load("")
out1 = out1[0, :, :, :]
out1 = out1[:, :, :, np.newaxis]
out2 = out2[0, :, :, :]
out2 = out2[:, :, :, np.newaxis]
in = in[0, :, :, :]
in = in[:, :, :, np.newaxis]
_, _loss = sess.run([optimizer, loss],
feed_dict={in: in, out1: out1, out2: out2})
save_path = saver.save(sess, my_save_path, global_step=int(_global_step))
sess.close()
tf.reset_default_graph()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
988 次 |
| 最近记录: |