Tensorflow密集梯度解释？

Question

Tensorflow密集梯度解释？

我最近实现了一个模型,当我运行它时,我收到了这个警告:

UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. 
This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

Run Code Online (Sandbox Code Playgroud)

随着一些类似的参数设置(嵌入维度)突然,模型是非常慢的.

这个警告意味着什么？似乎我所做的事情已经导致所有渐变都很密集,因此backprop正在进行密集矩阵计算
如果导致此问题的模型存在问题,我该如何识别并修复它？

Answer 1

mrr*_*rry 57

当稀疏tf.IndexedSlices对象被隐式转换为密集时,将打印此警告tf.Tensor.这通常发生在一个op(通常tf.gather())反向传播稀疏梯度时,但接收它的op没有可以处理稀疏梯度的专用梯度函数.因此,TensorFlow会自动增加密度tf.IndexedSlices,如果张量很大,则会对性能产生破坏性影响.

要解决此问题,您应该尝试确保params输入tf.gather()(或params输入tf.nn.embedding_lookup())是a tf.Variable.变量可以直接接收稀疏更新,因此不需要转换.虽然tf.gather()(和tf.nn.embedding_lookup())接受任意张量作为输入,但这可能导致更复杂的反向传播图,从而导致隐式转换.

我也有同样的问题.我对``tf.gather``的输入是``reshape``输出.如何将其转换为"变量"？谢谢. (7认同)
最简单的方法是查看代码中的`tf.gather()`或`tf.nn.embedding_lookup()`调用,找到张量`t`,即任何一个操作的`params`(first)参数,并打印`t.op`.通常,如果`t`是`tf.Variable`,你将获得最佳性能,但是某些操作例如`tf.concat()`具有使梯度有效的特化. (5认同)
它似乎是一个`boolean_mask`被一个`reshape`.在多重`reshape`s,`pack`s,`tile`s,`expand_dim`s,`squeeze`s,`batch_matmul`s等等之后,这用于远远超出图表的损失计算.有没有办法确定哪些操作不能接受稀疏梯度？ (4认同)
我也看到这个警告带有一个`boolean_mask`,但它只是正在输入正常变量 - 没有任何东西被重新塑造. (4认同)

Answer 2

Dan*_*ter 23

密集的Tensor可以被认为是标准的python数组.稀疏的可以被认为是索引和值的集合,例如

# dense
array = ['a', None, None, 'c']

# sparse
array = [(0, 'a'), (3, 'c')]

Run Code Online (Sandbox Code Playgroud)

因此,您可以看到,如果您有很多空条目,稀疏数组将比密集数组更有效.但如果填写所有条目,密集效率会更高.在你的情况下,在张量流图中的某个地方,稀疏数组被转换为一个不确定大小的密集数组.警告只是说你可能会浪费很多这样的记忆.但是,如果稀疏数组不是太大/已经非常密集,那么它可能根本不是问题.

如果你想诊断它我建议命名你的各种张量对象然后它将准确打印在这个转换中使用的那些,你可以找出你可以调整以删除它.

Answer 3

AI_*_*BOT 8

完全同意答案mrry.

实际上我会为这个问题发布另一个解决方案.

您可以使用tf.dynamic_partition()而不是tf.gather()消除警告.

示例代码如下:

# Create the cells for the RNN network
lstm = tf.nn.rnn_cell.BasicLSTMCell(128)

# Get the output and state from dynamic rnn
output, state = tf.nn.dynamic_rnn(lstm, sequence, dtype=tf.float32, sequence_length = seqlen)

# Convert output to a tessor and reshape it
outputs = tf.reshape(tf.pack(output), [-1, lstm.output_size])

# Set partions to 2
num_partitions = 2

# The partitions argument is a tensor which is already fed to a placeholder.
# It is a 1-D tensor with the length of batch_size * max_sequence_length.
# In this partitions tensor, you need to set the last output idx for each seq to 1 and 
# others remain 0, so that the result could be separated to two parts,
# one is the last outputs and the other one is the non-last outputs.
res_out = tf.dynamic_partition(outputs, partitions, num_partitions)

# prediction
preds = tf.matmul(res_out[1], weights) + bias

Run Code Online (Sandbox Code Playgroud)

希望这可以帮到你.

可以使用dynamic_partition代替tf.gather()，可以用什么代替tf.nn.embedding_lookup()？ (2认同)

归档时间：	10 年，3 月前
查看次数：	22768 次
最近记录：	7 年，7 月前