Google colab pro GPU 运行速度极慢

ojp*_*ojp 5 gpu machine-learning tensorflow google-colaboratory

我正在 colab Pro GPU 上运行 Convnet。我在运行时选择了 GPU,并且可以确认 GPU 可用。我运行的网络与昨天晚上完全相同,但每个 epoch 大约需要 2 小时...昨晚每个 epoch 大约需要 3 分钟...根本没有任何变化。我有一种感觉 Colab 可能限制了我的 GPU 使用,但我不知道如何判断这是否是问题所在。GPU 速度是否会根据一天中的时间等而波动很大?以下是我打印的一些诊断信息,有谁知道我如何更深入地调查这种缓慢行为的根本原因是什么?

\n\n

我还尝试将 colab 中的加速器更改为“无”,并且我的网络速度与选择“GPU”时的速度相同,这意味着由于某种原因我不再在 GPU 上进行训练,或者资源受到严重限制。我使用的是 Tensorflow 2.1。

\n\n
gpu_info = !nvidia-smi\ngpu_info = \'\\n\'.join(gpu_info)\nif gpu_info.find(\'failed\') >= 0:\n  print(\'Select the Runtime \xe2\x86\x92 "Change runtime type" menu to enable a GPU accelerator, \')\n  print(\'and then re-execute this cell.\')\nelse:\n  print(gpu_info)\n\nSun Mar 22 11:33:14 2020       \n+-----------------------------------------------------------------------------+\n| NVIDIA-SMI 440.64.00    Driver Version: 418.67       CUDA Version: 10.1     |\n|-------------------------------+----------------------+----------------------+\n| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n|===============================+======================+======================|\n|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |\n| N/A   40C    P0    32W / 250W |   8747MiB / 16280MiB |      0%      Default |\n+-------------------------------+----------------------+----------------------+\n\n+-----------------------------------------------------------------------------+\n| Processes:                                                       GPU Memory |\n|  GPU       PID   Type   Process name                             Usage      |\n|=============================================================================|\n+-----------------------------------------------------------------------------+\n
Run Code Online (Sandbox Code Playgroud)\n\n
def mem_report():\n  print("CPU RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ))\n\n  GPUs = GPUtil.getGPUs()\n  for i, gpu in enumerate(GPUs):\n    print(\'GPU {:d} ... Mem Free: {:.0f}MB / {:.0f}MB | Utilization {:3.0f}%\'.format(i, gpu.memoryFree, gpu.memoryTotal, gpu.memoryUtil*100))\n\nmem_report()\n
Run Code Online (Sandbox Code Playgroud)\n\n
CPU RAM Free: 24.5 GB\nGPU 0 ... Mem Free: 7533MB / 16280MB | Utilization  54%\n
Run Code Online (Sandbox Code Playgroud)\n\n

仍然没有运气加快速度,这是我的代码,也许我忽略了一些东西......顺便说一句,这些图像来自旧的 Kaggle 竞赛,数据可以在这里找到。训练图像保存在我的谷歌驱动器上。https://www.kaggle.com/c/datasciencebowl

\n\n
#loading images from kaggle api\n\n#os.environ[\'KAGGLE_USERNAME\'] = ""\n#os.environ[\'KAGGLE_KEY\'] = ""\n\n#!kaggle competitions download -c datasciencebowl\n\n#unpacking zip files\n\n#zipfile.ZipFile(\'./sampleSubmission.csv.zip\', \'r\').extractall(\'./\')\n#zipfile.ZipFile(\'./test.zip\', \'r\').extractall(\'./\')\n#zipfile.ZipFile(\'./train.zip\', \'r\').extractall(\'./\')\n\ndata_dir = pathlib.Path(\'train\')\n\nimage_count = len(list(data_dir.glob(\'*/*.jpg\')))\nCLASS_NAMES = np.array([item.name for item in data_dir.glob(\'*\') if item.name != "LICENSE.txt"])\n\nshrimp_zoea = list(data_dir.glob(\'shrimp_zoea/*\'))\nfor image_path in shrimp_zoea[:5]:\n    display.display(Image.open(str(image_path)))\n
Run Code Online (Sandbox Code Playgroud)\n\n
image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,\n                                                                  validation_split=0.2)\n                                                                  #rotation_range = 40,\n                                                                  #width_shift_range = 0.2,\n                                                                  #height_shift_range = 0.2,\n                                                                  #shear_range = 0.2,\n                                                                  #zoom_range = 0.2,\n                                                                  #horizontal_flip = True,\n                                                                  #fill_mode=\'nearest\')\n
Run Code Online (Sandbox Code Playgroud)\n\n
validation_split = 0.2\nBATCH_SIZE = 32\nBATCH_SIZE_VALID = 10\nIMG_HEIGHT = 224\nIMG_WIDTH = 224\nSTEPS_PER_EPOCH = np.ceil(image_count*(1-(validation_split))/BATCH_SIZE)\nVALIDATION_STEPS = np.ceil((image_count*(validation_split)/BATCH_SIZE))\n
Run Code Online (Sandbox Code Playgroud)\n\n
train_data_gen = image_generator.flow_from_directory(directory=str(data_dir),\n                                                     subset=\'training\',\n                                                     batch_size=BATCH_SIZE,\n                                                     class_mode = \'categorical\',\n                                                     shuffle=True,\n                                                     target_size=(IMG_HEIGHT, IMG_WIDTH),\n                                                     classes = list(CLASS_NAMES))\n\nvalidation_data_gen = image_generator.flow_from_directory(directory=str(data_dir),\n                                                     subset=\'validation\',\n                                                     batch_size=BATCH_SIZE_VALID,\n                                                     class_mode = \'categorical\',\n                                                     shuffle=True,\n                                                     target_size=(IMG_HEIGHT, IMG_WIDTH),\n                                                     classes = list(CLASS_NAMES))\n\n
Run Code Online (Sandbox Code Playgroud)\n\n
model_basic = tf.keras.models.Sequential([\n    tf.keras.layers.Conv2D(16, (3,3), activation=\'relu\', input_shape=(224, 224, 3)),\n    tf.keras.layers.MaxPooling2D(2, 2),\n    tf.keras.layers.Conv2D(32, (3,3), activation=\'relu\'),\n    tf.keras.layers.MaxPooling2D(2,2),\n    tf.keras.layers.Conv2D(64, (3,3), activation=\'relu\'),\n    tf.keras.layers.MaxPooling2D(2,2),\n    tf.keras.layers.Conv2D(128, (3,3), activation=\'relu\'),\n    tf.keras.layers.MaxPooling2D(2,2),\n    tf.keras.layers.Conv2D(128, (3,3), activation=\'relu\'),\n    tf.keras.layers.MaxPooling2D(2,2),\n    tf.keras.layers.Flatten(),\n    tf.keras.layers.Dropout(0.2),\n    tf.keras.layers.Dense(1000, activation=\'relu\'),\n    tf.keras.layers.Dense(121, activation=\'softmax\')\n])\n\nmodel_basic.summary()\n
Run Code Online (Sandbox Code Playgroud)\n\n
model_basic.compile(optimizer=\'adam\',\n              loss=\'categorical_crossentropy\',\n              metrics=[\'accuracy\'])\n
Run Code Online (Sandbox Code Playgroud)\n\n
history = model_basic.fit(\n          train_data_gen,\n          epochs=10,\n          verbose=1,\n          validation_data=validation_data_gen,\n          steps_per_epoch=STEPS_PER_EPOCH,\n          validation_steps=VALIDATION_STEPS,\n          initial_epoch=0         \n)\n
Run Code Online (Sandbox Code Playgroud)\n

ojp*_*ojp 7

最后,瓶颈似乎是在每批中将图像从谷歌驱动器加载到colab。将图像加载到磁盘将每个周期的时间减少到大约 30 秒...这是我用来加载到磁盘的代码:

!mkdir train_local
!unzip train.zip -d train_local
Run Code Online (Sandbox Code Playgroud)

将我的 train.zip 文件上传到 colab 后


Bob*_*ith 3

您的nvidia-smi输出清楚地表明 GPU 已连接。您将训练数据存储在哪里?如果不在本地磁盘上,我建议将其存储在那里。训练数据远程传输的速度可能会根据您的 Colab 后端所在位置而有所不同。