为什么TensorFlow总是使用GPU 0?

Wes*_*ley 14 python machine-learning tensorflow

我在多GPU设置上运行TensorFlow推理时遇到了问题.

环境:Python 3.6.4; TensorFlow 1.8.0; Centos 7.3; 2 Nvidia Tesla P4

系统空闲时,这是nvidia-smi输出:

Tue Aug 28 10:47:42 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:0C.0 Off |                    0 |
| N/A   38C    P0    22W /  75W |      0MiB /  7606MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P4            Off  | 00000000:00:0D.0 Off |                    0 |
| N/A   39C    P0    23W /  75W |      0MiB /  7606MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

与我的问题有关的主要陈述:

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

def get_sess_and_tensor(ckpt_path):
    assert os.path.exists(ckpt_path), "file: {} not exist.".format(ckpt_path)
    graph = tf.Graph()
    with graph.as_default():
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(ckpt_path, "rb") as fid1:
            od_graph_def.ParseFromString(fid1.read())
            tf.import_graph_def(od_graph_def, name="")
        sess = tf.Session(graph=graph)
    with tf.device('/gpu:1'):
        tensor = graph.get_tensor_by_name("image_tensor:0")
        boxes = graph.get_tensor_by_name("detection_boxes:0")
        scores = graph.get_tensor_by_name("detection_scores:0")
        classes = graph.get_tensor_by_name('detection_classes:0')

    return sess, tensor, boxes, scores, classes
Run Code Online (Sandbox Code Playgroud)

因此,问题是,当我将可见设备设置为'0,1'时,即使我将tf.device设置为GPU 1,当运行推理时,我从nvidia-smi看到只使用GPU 0(GPU 0的GPU- Util很高 - 几乎100% - 而GPU 1是0).为什么不使用GPU 1?

我想并行使用这两个GPU,但即使使用以下代码,它仍然只使用GPU 0:

with tf.device('/gpu:0'):
    tensor = graph.get_tensor_by_name("image_tensor:0")
    boxes = graph.get_tensor_by_name("detection_boxes:0")
with tf.device('/gpu:1'):
    scores = graph.get_tensor_by_name("detection_scores:0")
    classes = graph.get_tensor_by_name('detection_classes:0')
Run Code Online (Sandbox Code Playgroud)

任何建议都非常感谢.

谢谢.

韦斯利

Ami*_*ila 1

根据您的设置,设备名称可能会有所不同。

执行:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
Run Code Online (Sandbox Code Playgroud)

并尝试完全按照此处列出的方式将设备用于name第二个 GPU。