TensorFlow:Dst张量未初始化

Question

TensorFlow:Dst张量未初始化

该MNIST For ML Beginners教程是给我一个错误,当我运行print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})).其他一切都很好.

错误和跟踪:

InternalErrorTraceback (most recent call last)
<ipython-input-16-219711f7d235> in <module>()
----> 1 print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
    338     try:
    339       result = self._run(None, fetches, feed_dict, options_ptr,
--> 340                          run_metadata_ptr)
    341       if run_metadata:
    342         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
    562     try:
    563       results = self._do_run(handle, target_list, unique_fetches,
--> 564                              feed_dict_string, options, run_metadata)
    565     finally:
    566       # The movers are no longer used. Delete them.

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
    635     if handle is None:
    636       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637                            target_list, options, run_metadata)
    638     else:
    639       return self._do_call(_prun_fn, self._session, handle, feed_dict,

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
    657       # pylint: disable=protected-access
    658       raise errors._make_specific_exception(node_def, op, error_message,
--> 659                                             e.code)
    660       # pylint: enable=protected-access
    661 

InternalError: Dst tensor is not initialized.
     [[Node: _recv_Placeholder_3_0/_1007 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_312__recv_Placeholder_3_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
     [[Node: Mean_1/_1011 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_319_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Run Code Online (Sandbox Code Playgroud)

我刚刚切换到更新版本的CUDA,所以这可能与此有关吗？似乎这个错误是关于将张量复制到GPU.

堆栈:EC2 g2.8xlarge机器,Ubuntu 14.04

更新:

print(sess.run(accuracy, feed_dict={x: batch_xs, y_: batch_ys}))运行正常.这让我怀疑问题在于我正在尝试将巨大的张量转移到GPU而它无法接受它.像迷你巴士这样的小型张量工作得很好.

更新2:

我已经弄清楚了张量引起这个问题的确切程度:

batch_size = 7509 #Works.
print(sess.run(accuracy, feed_dict={x: mnist.test.images[0:batch_size], y_: mnist.test.labels[0:batch_size]}))

batch_size = 7510 #Doesn't work. Gets the Dst error.
print(sess.run(accuracy, feed_dict={x: mnist.test.images[0:batch_size], y_: mnist.test.labels[0:batch_size]}))

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 20

为简洁起见,当没有足够的内存来处理批量大小时,会生成此错误消息.

扩展Steven的链接(我还不能发表评论),这里有一些技巧来监视/控制Tensorflow中的内存使用:

要在运行期间监视内存使用情况,请考虑记录运行元数然后,您可以在Tensorboard中查看图表中每个节点的内存使用情况.有关详细信息和示例,请参阅Tensorboard信息页面.
默认情况下,Tensorflow将尝试分配尽可能多的GPU内存.您可以使用GPUConfig选项更改此设置,以便Tensorflow仅根据需要分配尽可能多的内存.请参阅相关文档.在那里你还可以找到一个选项,它允许你只分配一部分你的GPU内存(我发现有时它会被打破).

Answer 2

Ste*_*ven 7

请记住，ec2 g2.8xlarge 只有 4 GB 的 GPU 内存。
https://aws.amazon.com/ec2/instance-types/

除了以批量大小 1 运行模型之外，我没有一个好的方法来找出模型占用了多少空间，然后您可以减去一张图像占用的空间。

从那里您可以确定最大批量大小。这应该可行，但我认为tensorflow动态分配gpu内存类似于torch，而不像caffe，caffe会从一开始就阻塞它所需的最大gpu空间。因此，您可能希望对最大批量大小保持保守。

Answer 3

Jul*_*bal 7

我认为此链接可以提供帮助https://github.com/aymericdamien/TensorFlow-Examples/issues/38#issuecomment-223793214。在我的例子中，GPU 正忙于（93% 忙）在screen. 我需要终止这个进程，并且很高兴后来看到东西起作用了。

归档时间：	10 年前
查看次数：	17906 次
最近记录：	7 年，6 月前