Ben*_*Ben 6 python machine-learning pytorch chatgpt-api langchain
我想创建一个自托管的 LLM 模型,该模型将能够拥有我自己的自定义数据的上下文(就此而言,Slack 对话)。
我听说 Vicuna 是 ChatGPT 的一个很好的替代品,所以我编写了以下代码:
from llama_index import SimpleDirectoryReader, LangchainEmbedding, GPTListIndex, \
GPTSimpleVectorIndex, PromptHelper, LLMPredictor, Document, ServiceContext
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
import torch
from langchain.llms.base import LLM
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
!export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
class CustomLLM(LLM):
model_name = "eachadea/vicuna-13b-1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
pipeline = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0,
model_kwargs={"torch_dtype":torch.bfloat16})
def _call(self, prompt, stop=None):
return self.pipeline(prompt, max_length=9999)[0]["generated_text"]
def _identifying_params(self):
return {"name_of_model": self.model_name}
def _llm_type(self):
return "custom"
llm_predictor = LLMPredictor(llm=CustomLLM())
Run Code Online (Sandbox Code Playgroud)
但遗憾的是我遇到了以下错误:
OutOfMemoryError: CUDA out of memory. Tried to allocate 270.00 MiB (GPU 0; 22.03 GiB total capacity; 21.65 GiB
already allocated; 94.88 MiB free; 21.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated
memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and
PYTORCH_CUDA_ALLOC_CONF
Run Code Online (Sandbox Code Playgroud)
这是(在运行任何内容之前)的输出!nvidia-smi:
Thu Apr 20 18:04:00 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10G Off| 00000000:00:1E.0 Off | 0 |
| 0% 23C P0 52W / 300W| 0MiB / 23028MiB | 18% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
知道如何修改我的代码以使其工作吗?
长度太长,9999会消耗大量的GPU RAM,特别是使用13b模型。尝试7b型号。并尝试使用 peft/bitsandbytes 之类的东西来减少 GPU RAM 使用。设置 load_in_8bit=True 是一个好的开始。
| 归档时间: |
|
| 查看次数: |
6965 次 |
| 最近记录: |