如何在多个GPU上运行classify_image？

Question

如何在多个GPU上运行classify_image？

Dmi*_*sil 5 python gpu tensorflow tensorflow2.0

我想使用多个 GPU 对图像运行矢量化（现在我的脚本只使用一个 GPU）。我有一个图像、图表和会话列表。脚本的输出是保存的向量。我的机器有 3 个 NVIDIA GPU。环境：Ubuntu、python 3.7、Tensorflow 2.0（支持 GPU）。这是我的代码示例（初始化会话）：

def load_graph(frozen_graph_filename):
     # We load the protobuf file from the disk and parse it to retrieve the
     # unserialized graph_def
     with tf.io.gfile.GFile(frozen_graph_filename, "rb") as f:
         graph_def = tf.compat.v1.GraphDef()
         graph_def.ParseFromString(f.read())
     # Then, we import the graph_def into a new Graph and returns it
     with tf.Graph().as_default() as graph:
         # The name var will prefix every op/nodes in your graph
         # Since we load everything in a new graph, this is not needed
         tf.import_graph_def(graph_def, name="")
     return graph

GRAPH = load_graph(os.path.join(settings.IMAGENET_PATH['PATH'], 'classify_image_graph_def.pb'))
config = tf.compat.v1.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9
config.gpu_options.allow_growth = True
SESSION = tf.compat.v1.Session(graph=GRAPH, config=config)

Run Code Online (Sandbox Code Playgroud)

之后，我将运行矢量化称为：

sess = SESSION
for image_index, image in enumerate(image_list):
    with Image.open(image) as f:
        image_data = f.convert('RGB')
        feature_tensor = POOL_TENSOR
        feature_set = sess.run(feature_tensor, {'DecodeJpeg:0': image_data})
        feature_vector = np.squeeze(feature_set)
        outfile_name = os.path.basename(image) + ".vc"
        this_is_path = settings.VECTORS_DIR_PATH['PATH']
        out_path = os.path.join(this_is_path, outfile_name)
        np.savetxt(out_path, feature_vector, delimiter=',')

Run Code Online (Sandbox Code Playgroud)

这个工作示例在 29 秒内在第一个 GPU 100 矢量上运行。因此，我尝试使用 Tensorflow 文档中的这种分布式训练方法在多个 GPU 上运行：

mirorred_strategy = tf.distribute.MirorredStrategy()
with mirorred_strategy.scope():
    sess = SESSION
    # and here all the code from previous example after session:
    for image_index, image in enumerate(image_list):
        with Image.open(image) as f:
            image_data = f.convert('RGB')
            feature_tensor = POOL_TENSOR
            feature_set = sess.run(feature_tensor, {'DecodeJpeg:0': image_data})
            feature_vector = np.squeeze(feature_set)
            outfile_name = os.path.basename(image) + ".vc"
            this_is_path = settings.VECTORS_DIR_PATH['PATH']
            out_path = os.path.join(this_is_path, outfile_name)
            np.savetxt(out_path, feature_vector, delimiter=',')

Run Code Online (Sandbox Code Playgroud)

检查日志后，我可以得出结论，Tensorflow 可以访问所有三个 GPU。然而，这并没有改变任何事情：在运行时，Tensorflow 仍然只使用第一个 GPU（29 秒内 100 个向量）。我尝试的另一种方法是我手动将每个项目设置为具体的 GPU 实例：

sess = SESSION
for image_index, image in enumerate(image_list):
    if image_index % 2 == 0:
        device_name = '/gpu:1'
    elif image_index % 3 == 0:
        device_name = '/gpu:2'
    else:
        device_name = '/gpu:0'
    with tf.device(device_name):
        with Image.open(image) as f:
            image_data = f.convert('RGB')
            feature_tensor = POOL_TENSOR
            feature_set = sess.run(feature_tensor, {'DecodeJpeg:0': image_data})
            feature_vector = np.squeeze(feature_set)
            outfile_name = os.path.basename(image) + ".vc"
            this_is_path = settings.VECTORS_DIR_PATH['PATH']
            out_path = os.path.join(this_is_path, outfile_name)
            np.savetxt(out_path, feature_vector, delimiter=',')

Run Code Online (Sandbox Code Playgroud)

监控此方法我观察到每个 GPU 都在使用，但没有看到性能加速，因为 Tensorflow 正在从一个 GPU 设备交换到另一个。因此，第一项GPU:0将被使用，GPU:1,GPU:2只是在等待，第二项GPU:1将起作用，GPU:0,GPU:2将等待。我还尝试了 tf docs 中的另一种 Tensorflow 策略- 没有任何更改。还尝试tf.Session()在 for 循环内定义- 没有成功。并找到了这个- 但不能让它适用于我的代码。

我的问题是：

1）是否有办法修改tf.distribute.MirorredStrategy()让 Tensorflow 使用所有三个 GPU？

2）如果（1）的答案不是，我如何使用所有GPU能力运行矢量化（也许这里存在执行此操作或其他操作的异步方式）？

Answer 1

Arr*_*Cao 4

你的（来自第三个代码片段）没有使用所有 GPU 的原因mirorred_strategy是你的模型输入是手动给出的（使用 TF1 式feature_tensor张量）并且 TensorFlow 不知道如何自动将数据均匀分配到你的 GPU，你可能看看这里的文档。

第四个片段（最后一个）也失败了，因为您使用它的方式不正确，您可以尝试首先构建模型图，然后在会话中运行该图，但不要将它们放在一起，您可以尝试移动feature_set = sess.run(feature_tensor, {'DecodeJpeg:0': image_data})在 for 循环之外。这里的指南可能会更好地说明。

归档时间：	6 年前
查看次数：	274 次
最近记录：	6 年前