如何将 Llama 模型与 langchain 结合使用？它给出了一个错误：管道无法从以下位置推断出合适的模型类：<model_name> - HuggingFace

Question

如何将 Llama 模型与 langchain 结合使用？它给出了一个错误：管道无法从以下位置推断出合适的模型类：<model_name> - HuggingFace

Luc*_*edo 2 python huggingface-transformers langchain large-language-model chromadb

使用 peft 和 lora微调模型 ( https://huggingface.co/decapoda-research/llama-7b-hf ) 并保存为https://huggingface.co/lucas0/empath-llama-7b。Pipeline cannot infer suitable model classes from现在，当我尝试将它与 langchain 和 chroma vectordb 一起使用时，我得到：

from langchain.embeddings import HuggingFaceHubEmbeddings
from langchain import PromptTemplate, HuggingFaceHub, LLMChain
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.vectorstores import Chroma

repo_id = "sentence-transformers/all-mpnet-base-v2"
embedder = HuggingFaceHubEmbeddings(
    repo_id=repo_id,
    task="feature-extraction",
    huggingfacehub_api_token="XXXXX",
)
comments = ["foo", "bar"]
embeddings = embedder.embed_documents(texts=comments)
docsearch = Chroma.from_texts(comments, embedder).as_retriever()
#docsearch = Chroma.from_documents(texts, embeddings)

llm = HuggingFaceHub(repo_id='lucas0/empath-llama-7b', huggingfacehub_api_token='XXXXX')
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch, return_source_documents=False)

q = input("input your query:")
result = qa.run(query=q)

print(result["result"])

Run Code Online (Sandbox Code Playgroud)

有人能告诉我如何解决这个问题吗？是不是卡的型号有问题？我遇到了缺少 config.json 文件的问题，最终只是放置了与我用作 lora 微调基础的模型相同的 config.json。这可能是问题的根源吗？如果是这样，如何生成正确的 config.json 而不必获取原始的 llama 权重？

另外，是否有一种方法可以在不使用矢量数据库的情况下将多个句子加载到自定义 HF 模型中（不仅仅是 OpenAi，如教程所示）？

谢谢！

尝试在模型的 HF 页面上运行 API 时会发生同样的问题：

Answer 1

alv*_*vas 12

在使用langchainhuggingface模型的API之前，您应该尝试在Huggingface中加载模型：

from transformers import AutoModel

model = AutoModel.from_pretrained('lucas0/empath-llama-7b')

Run Code Online (Sandbox Code Playgroud)

这会引发一些错误：

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-1b9ce76f5421> in <cell line: 3>()
      1 from transformers import AutoModel
      2 
----> 3 model = AutoModel.from_pretrained('lucas0/empath-llama-7b')

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
   2553                             )
   2554                         else:
-> 2555                             raise EnvironmentError(
   2556                                 f"{pretrained_model_name_or_path} does not appear to have a file named"
   2557                                 f" {_add_variant(WEIGHTS_NAME, variant)}, {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME} or"

OSError: lucas0/empath-llama-7b does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

Run Code Online (Sandbox Code Playgroud)

然后查看模型文件，看起来只保存了适配器模型，而不保存模型，https://huggingface.co/lucas0/empath-llama-7b/tree/main，因此 Automodel 正在发脾气。

要加载适配模型，您必须将基本模型和peft（适配器模型分开，首先安装（如果需要，安装后重新启动）：

! pip install -U peft accelerate
! pip install -U sentencepiece
! pip install -U transformers

Run Code Online (Sandbox Code Playgroud)

然后要加载模型，请查看示例guanaco，尝试安装 guanaco (pip install guanaco) 以进行文本分类模型，但出现错误（您将需要 GPU 运行时）

import torch
from peft import PeftModel    
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer

model_name = "decapoda-research/llama-7b-hf"
adapters_name = 'lucas0/empath-llama-7b'

print(f"Starting to load the model {model_name} into memory")

m = AutoModelForCausalLM.from_pretrained(
    model_name,
    #load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map={"": 0}
)
m = PeftModel.from_pretrained(m, adapters_name)
m = m.merge_and_unload()
tok = LlamaTokenizer.from_pretrained(model_name)
tok.bos_token_id = 1

stop_token_ids = [0]

print(f"Successfully loaded the model {model_name} into memory")

Run Code Online (Sandbox Code Playgroud)

现在你可以加载你在 Huggingface 中调整/微调过的模型transformers，你可以尝试一下langchain，在此之前我们必须挖掘 langchain 代码，要使用 HF 模型的提示，用户被告知这样做：

from langchain import PromptTemplate, LLMChain, HuggingFaceHub

template = """ Hey llama, you like to eat quinoa. Whatever question I ask you, you reply with "Waffles, waffles, waffles!".
 Question: {input} Answer: """
prompt = PromptTemplate(template=template, input_variables=["input"])


model = HuggingFaceHub(repo_id="facebook/mbart-large-50",
                       model_kwargs={"temperature": 0, "max_length":200},
chain = LLMChain(prompt=prompt, llm=model)

Run Code Online (Sandbox Code Playgroud)

但当我们看到这个HuggingFaceHub物体时，它不仅仅是AutoModel变形金刚拥抱脸中的一个普通物体。

当我们查看https://github.com/hwchase17/langchain/blob/master/langchain/chains/llm.py时，我们看到它正在尝试使用llm=...一些包装类加载参数，因此我们更深入地研究 langchain 的HuggingFaceHub对象https://github.com/hwchase17/langchain/blob/master/langchain/llms/huggingface_hub.py

该对象HuggingFaceHub包装了或huggingface_hub.inference_api.InferenceApi任务text-generationtext2text-generationsummarization

看起来HuggingFaceHub像一些像意大利面条一样的对象，继承自LLM对象https://github.com/hwchase17/langchain/blob/master/langchain/llms/base.py#L453

总结一下，我们想要：

HuggingFaceHub使用API加载 a langchain，
实际上是HuggingFaceHub一个包装huggingface_hub.inference_api.InferenceApi
该HuggingFaceHub对象是的子类llm.base.LLM

有了关于该HuggingFaceHub对象的知识，现在我们有几种选择：

意见：最简单的方法是完全避免langchain，因为它是事物的包装器，您可以编写自定义的包装器，跳过在 langchain 中创建的继承级别，以包装尽可能多的工具。

理想情况下：要求langchain开发人员/维护人员加载peft/适配器模型并为它们编写另一个子类

实用： * 让我们破解这个东西并编写我们自己的LLM子类。

实用的解决方案：

让我们尝试创建一个新的LLM子类

from typing import Any, Dict, List, Mapping, Optional

from pydantic import Extra, root_validator

from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens

from langchain import PromptTemplate, LLMChain

class HuggingFaceHugs(LLM):
  pipeline: Any
  class Config:
    """Configuration for this pydantic object."""
    extra = Extra.forbid

  def __init__(self, model, tokenizer, task="text-generation"):
    super().__init__()
    self.pipeline = pipeline(task, model=model, tokenizer=tokenizer)

  @property
  def _llm_type(self) -> str:
    """Return type of llm."""
    return "huggingface_hub"

  def _call(self, prompt, stop: Optional[List[str]] = None, run_manager: Optional[CallbackManagerForLLMRun] = None,):
    # Runt the inference.
    text = self.pipeline(prompt, max_length=100)[0]['generated_text']
    
    # @alvas: I've totally no idea what this in langchain does, so I copied it verbatim.
    if stop is not None:
      # This is a bit hacky, but I can't figure out a better way to enforce
      # stop tokens when making calls to huggingface_hub.
      text = enforce_stop_tokens(text, stop)
    print(text)
    return text[len(prompt):]


template = """ Hey llama, you like to eat quinoa. Whatever question I ask you, you reply with "Waffles, waffles, waffles!".
 Question: {input} Answer: """
prompt = PromptTemplate(template=template, input_variables=["input"])


hf_model = HuggingFaceHugs(model=m, tokenizer=tok)

chain = LLMChain(prompt=prompt, llm=hf_model)

chain("Who is Princess Momo?")

Run Code Online (Sandbox Code Playgroud)

唷，langchain没有抱怨......这是输出：

{'input': 'Who is Princess Momo?',
 'text': ' She is a princess.  She is a princess.  She is a princess.  She is a princess.  She is a princess.  She is a princess.  She is a princess.  She is'}

Run Code Online (Sandbox Code Playgroud)

尾声：显然这只美洲驼模型不明白它需要做的就是回复Waffles, waffles, waffles

长话短说

请参阅https://colab.research.google.com/drive/1l2GiSSPbajVyp2Nk3CFT4t3uH6-5TiBe?usp=sharing

我的 CUDA 内存不足。尝试分配 1207.54 GiB（GPU 0；14.62 GiB 总容量；11.04 GiB 已分配；2.79 GiB 空闲；PyTorch 总共保留 11.32 GiB）如果保留内存 >> 已分配内存，请尝试设置 max_split_size_mb 以避免碎片。请参阅内存管理和 PYTORCH_CUDA_ALLOC_CONF` 1207 GB 的文档，需要......有问题。我将提出另一个问题。 (2认同)

归档时间：	2 年，5 月前
查看次数：	8221 次
最近记录：	2 年，5 月前

如何将 Llama 模型与 langchain 结合使用？它给出了一个错误：管道无法从以下位置推断出合适的模型类：&lt;model_name&gt; - HuggingFace

长话短说

如何将 Llama 模型与 langchain 结合使用？它给出了一个错误：管道无法从以下位置推断出合适的模型类：<model_name> - HuggingFace