如何在不重新启动 Google Colaboratory (Tensorflow) 中的运行时清除 GPU 内存

Question

如何在不重新启动 Google Colaboratory (Tensorflow) 中的运行时清除 GPU 内存

8fa*_*ial 11 tensorflow google-colaboratory

我想为神经样式转移算法运行超参数调整，这会导致有一个 for 循环，其中我的模型输出每次迭代使用不同超参数生成的图像。

它使用 GPU 运行时在 Google Colaboratory 中运行。在运行时，我有时会收到一条错误消息，指出我的 GPU 内存几乎已满，然后程序停止。

所以我在想，也许有一种方法可以在特定次数的迭代后清除或重置 GPU 内存，以便程序可以正常终止（遍历 for 循环中的所有迭代，而不仅仅是例如 3000 次中的 1500 次，因为完全GPU内存）

我已经尝试过这段我在网上找到的代码：

# Reset Keras Session
def reset_keras():
    sess = get_session()
    clear_session()
    sess.close()
    sess = get_session()

    try:
        del classifier # this is from global space - change this as you need
    except:
        pass

    #print(gc.collect()) # if it's done something you should see a number being outputted

    # use the same config as you used to create the session
    config = tf.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 1
    config.gpu_options.visible_device_list = "0"
    set_session(tf.Session(config=config))

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 15

在tensorflow中你可以使用numba库：

!pip install numba

from numba import cuda 
device = cuda.get_current_device()
device.reset()

Run Code Online (Sandbox Code Playgroud)

这是有效的答案 (3认同)
执行此操作后，我得到``RuntimeError：CUDA错误：遇到非法内存访问``` (2认同)

Answer 2

Sha*_*aza 6

您可以在笔记本中的单元格内运行命令“!nvidia-smi”，并终止 GPU 的进程 id，如“!kill process_id”。尝试使用更简单的数据结构，例如字典、向量。

如果您使用的是 pytorch，请运行命令torch.cuda.clear_cache

有时，没有可杀死的process_id (2认同)

Answer 3

Ari*_*Ari 6

如果您使用的是 torch，torch.cuda.empty_cache()则应该遵循以下方式，nvidia-smi以确保您不能直接终止该进程。

归档时间：	6 年，6 月前
查看次数：	5394 次
最近记录：	4 年，7 月前