在Tensorflow训练期间如何打印渐变？

Question

在Tensorflow训练期间如何打印渐变？

min*_*als 4 python variables machine-learning tensorflow tensorflow-gradient

为了调试Tensorflow模型，我需要查看渐变是否已更改或其中是否存在nans。仅在Tensorflow中打印变量不起作用，因为您看到的是：

 <tf.Variable 'Model/embedding:0' shape=(8182, 100) dtype=float32_ref>

Run Code Online (Sandbox Code Playgroud)

我尝试使用tf.Print类，但无法使其工作，我想知道是否可以以这种方式实际使用它。在我的模型中，我有一个训练循环，可以打印每个时期的损耗值：

def run_epoch(session, model, eval_op=None, verbose=False):
    costs = 0.0
    iters = 0
    state = session.run(model.initial_state)
    fetches = {
            "cost": model.cost,
            "final_state": model.final_state,
    }
    if eval_op is not None:
        fetches["eval_op"] = eval_op

    for step in range(model.input.epoch_size):
        feed_dict = {}
        for i, (c, h) in enumerate(model.initial_state):
            feed_dict[c] = state[i].c
            feed_dict[h] = state[i].h

        vals = session.run(fetches, feed_dict)
        cost = vals["cost"]
        state = vals["final_state"]

        costs += cost
        iters += model.input.num_steps

    print("Loss:", costs)

    return costs

Run Code Online (Sandbox Code Playgroud)

插入print(model.gradients[0][1])此功能无效，因此我尝试在丢失打印后立即使用以下代码：

grads = model.gradients[0][1]
x = tf.Print(grads, [grads])
session.run(x)

Run Code Online (Sandbox Code Playgroud)

但是我收到以下错误消息：

ValueError: Fetch argument <tf.Tensor 'mul:0' shape=(8182, 100) dtype=float32> cannot be interpreted as a Tensor. (Tensor Tensor("mul:0", shape=(8182, 100), dtype=float32) is not an element of this graph.)

Run Code Online (Sandbox Code Playgroud)

这是有道理的，因为tf.Print确实不是图形的一部分。因此，我尝试tf.Print在实际图形中使用损耗后计算，但是效果不佳，我仍然得到了Tensor("Train/Model/mul:0", shape=(8182, 100), dtype=float32)。

如何在Tensorflow的训练循环内打印渐变变量？

Answer 1

Max*_*xim 6

以我的经验，在tensorflow中查看梯度流的最佳方法不是使用tf.Print，而是使用tensorboard。这是我在另一个问题中使用的示例代码，其中梯度是学习中的关键问题：

for g, v in grads_and_vars:
  tf.summary.histogram(v.name, v)
  tf.summary.histogram(v.name + '_grad', g)

merged = tf.summary.merge_all()
writer = tf.summary.FileWriter('train_log_layer', tf.get_default_graph())

...

_, summary = sess.run([train_op, merged], feed_dict={I: 2*np.random.rand(1, 1)-1})
if i % 10 == 0:
  writer.add_summary(summary, global_step=i)

Run Code Online (Sandbox Code Playgroud)

这将为您显示渐变随时间的分布。顺便说一句，要检查NaN，在tensorflow中有一个专用功能：tf.is_nan。通常，您不需要检查渐变是否为NaN：当渐变发生时，变量也会爆炸，并且在tensorboard中清晰可见。

如何在一个迭代器（又名```grads_and_vars```）中获得所有梯度和变量？ (2认同)

归档时间：	7 年，11 月前
查看次数：	2495 次
最近记录：	6 年，3 月前