geo*_*hdd 13 python gpu keras tensorflow
我使用此脚本在安装并启用了 GPU 的机器上训练模型和预测,并且它似乎在预测阶段仅使用 CPU。
我在这.predict()部分看到的设备放置日志如下:
2020-09-01 06:08:19.085400: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RangeDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.085617: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RepeatDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.089558: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op MapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.090003: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op PrefetchDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097064: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op FlatMapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097647: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op TensorDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097802: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RepeatDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097957: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ZipDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.101284: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ParallelMapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.101865: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ModelDataset in device /job:localhost/replica:0/task:0/device:CPU:0
Run Code Online (Sandbox Code Playgroud)
即使当我运行时:
print(tf.config.experimental.list_physical_devices('GPU'))
Run Code Online (Sandbox Code Playgroud)
我收到:
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')]
Run Code Online (Sandbox Code Playgroud)
我使用的代码可以在这里找到。在全输出日志可以在这里看到。
更多上下文:
Python:3.7.7
Tensorflow:2.1.0
GPU:Nvidia Tesla V100-PCIE-16GB
CPU:Intel Xeon Gold 5218 CPU @ 2.30GHz
RAM:394851272 KB
OS:Linux
听起来您需要使用Distributed Strategy每个文档。然后你的代码将变成如下所示:
tf.debugging.set_log_device_placement(True)
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = keras.Sequential(
[
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
]
)
model.compile(
optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy']
)
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
probability_model = tf.keras.Sequential(
[model, tf.keras.layers.Softmax()]
)
probability_model.predict(test_images)
Run Code Online (Sandbox Code Playgroud)
根据文档,使用多个 GPU 的最佳实践是使用 tf.distribute.Strategy。
| 归档时间: |
|
| 查看次数: |
933 次 |
| 最近记录: |