Sad*_*afi 8 pytorch huggingface-transformers large-language-model llama
我正在尝试在带有服务器的计算机上运行 Llama 2.0,它警告我,我的速度会变慢,因为我犯了一些我不知道的错误,但是它可以工作,但我不知道如何优化它
以下是功能代码
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
class LlamaChatBot:
def __init__(self, model_name ="daryl149/llama-2-7b-chat-hf"):
torch.cuda.empty_cache()
self.isGPU = torch.cuda.is_available()
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if self.isGPU:
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map='auto', load_in_4bit=True
)
else:
self.tokenizer = AutoTokenizer.from_pretrained("daryl149/llama-2-7b-chat-hf")
self.model = AutoModelForCausalLM.from_pretrained(model_name).to(self.device)
def generate_response(self, prompt):
if self.isGPU():
input_ids = self.tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')
else: input_ids = self.tokenizer(prompt, return_tensors="pt").input_ids
generated_ids = self.model.generate(input_ids, max_length=1024)
generated_text = self.tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(generated_text)
return generated_text
Run Code Online (Sandbox Code Playgroud)
警告 :
warnings.warn(f'Input type into Linear4bit is torch.float16,
but bnb_4bit_compute_type=torch.float32 (default).
This will lead to slow inference or training speed.')
Run Code Online (Sandbox Code Playgroud)
硬件 :
Dell Precision T7920 Tower server/Workstation
Intel xeon gold processor @ 18 cores 2.3 ghz dual 36 cores 72 virtual cpus
512GB DDR4 RAM
UPGRADABLE UPTO 3TB RAM
512GB SSD HDD FOR BOOTING
7TB SATA HDD FOR STORAGE
24GB RTX 3090 DDR6 GRAPHICS CARD
Run Code Online (Sandbox Code Playgroud)
Vic*_*mez 10
您可以在下一个Notebook中找到解决方案 ,使用如下内容:
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
Run Code Online (Sandbox Code Playgroud)
当您使用 from_pretrained() Transformers 方法加载模型时:
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3723 次 |
| 最近记录: |