我试图在张量流上写一个分布式变分自动编码器standalone mode.
我的群集包括3台机器,命名为m1,m2和m3.我试图在m1上运行1 ps服务器,在m2和m3上运行2个工作服务器.(分布式tensorflow文档中的示例培训师程序)在m3上,我收到以下错误消息:
Traceback (most recent call last):
File "/home/yama/mfs/ZhuSuan/examples/vae.py", line 241, in <module>
save_model_secs=600)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 334, in __init__
self._verify_setup()
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 863, in _verify_setup
"their device set: %s" % op)
ValueError: When using replicas, all Variables must have their device set: name: "Variable"
op: "Variable"
attr {
key: "container"
value {
s: ""
}
}
attr {
key: "dtype"
value {
type: DT_INT32
}
}
attr {
key: "shape"
value { …Run Code Online (Sandbox Code Playgroud)