加载拥抱脸部模型占用太多内存

Bud*_*lle 4 python nlp pytorch huggingface-transformers huggingface

我正在尝试使用如下代码加载大型拥抱脸部模型:

model_from_disc = AutoModelForCausalLM.from_pretrained(path_to_model)
tokenizer_from_disc = AutoTokenizer.from_pretrained(path_to_model)
generator = pipeline("text-generation", model=model_from_disc, tokenizer=tokenizer_from_disc)
Run Code Online (Sandbox Code Playgroud)

由于内存不足,程序在第一行之后很快就崩溃了。有没有办法在加载模型时对其进行分块,以便程序不会崩溃?


编辑
请参阅 cronoik 的答案以获取已接受的解决方案,但以下是 Hugging Face 文档上的相关页面:

分片检查点: https://huggingface.co/docs/transformers/big_models#sharded-checkpoints :
~:text=in%20the%20future.-,Sharded%20checkpoints,-Since%20version%204.18.0 大型模型加载: https ://huggingface.co/docs/transformers/main_classes/model#:~:text=the%20weights%20instead.-,Large%20model%20loading,-In%20Transformers%204.20.0

cro*_*oik 7

您可以尝试使用low_cpu_mem_usage加载它:

from transformers import AutoModelForSeq2SeqLM

model_from_disc = AutoModelForCausalLM.from_pretrained(path_to_model, low_cpu_mem_usage=True)
Run Code Online (Sandbox Code Playgroud)

请注意,low_cpu_mem_usage要求:Accelerate >= 0.9.0 且 PyTorch >= 1.9.0。