使用 Llama 2 模型时，即使删除变量后也无法清除 GPU 内存

将 Llama 2 模型加载到管道后，我在清除 GPU 内存时遇到问题。

清除GPU内存在其他模型上工作正常（即del变量，torch.cuda.empty_cache()），但在使用Llama 2模型时似乎不起作用。

我在我的 ubuntu 22 PC 上以及带有 GPU 的 google colab 上测试了这个，并且行为是一致的。如果我实例化分词器和模型，然后删除它们，GPU 内存将被清除。但如果我也实例化管道，然后将其删除，GPU 内存仍然存在。示例代码如下：

from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
import torch
import gc

modelId = "meta-llama/Llama-2-7b-chat-hf"

model = AutoModelForCausalLM.from_pretrained(modelId, device_map="auto", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(modelId)

## if pipeline isnt instantiated, the GPU memory is released upon model del
## if pipeline is instantiated, del pipeline doesnt release GPU memory!
pipeline = pipeline(task="text-generation", model=model, tokenizer=tokenizer)

## clearing out GPU memory
del model
del tokenizer
del pipeline
gc.collect()
torch.cuda.empty_cache()

Run Code Online (Sandbox Code Playgroud)

还有其他人经历过吗？任何见解或指导将非常感激。

谢谢

归档时间：	2 年，5 月前
查看次数：	731 次
最近记录：	2 年，5 月前