model = AutoModelForCausalLM.from_pretrained("finetuned_model")
Run Code Online (Sandbox Code Playgroud)
产量Killed.
产量
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_model_id = "lucas0/empath-llama-7b"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(cwd+"/tokenizer.model")
# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)
Run Code Online (Sandbox Code Playgroud)
产量
AttributeError: /home/ubuntu/empath/lora/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
Run Code Online (Sandbox Code Playgroud)
我使用 PEFT 和 LoRa 微调了模型:
model = AutoModelForCausalLM.from_pretrained(
"decapoda-research/llama-7b-hf",
torch_dtype=torch.float16,
device_map='auto',
)
Run Code Online (Sandbox Code Playgroud)
我必须下载并手动指定 llama 标记器。
tokenizer = LlamaTokenizer(cwd+"/tokenizer.model")
tokenizer.pad_token = tokenizer.eos_token
Run Code Online (Sandbox Code Playgroud)
参加培训: …
我需要使用它pipeline来从数据集上的模型中获得标记化和推理distilbert-base-uncased-finetuned-sst-2-english。
我的数据是一个句子列表,出于娱乐目的,我们可以假设它是:
texts = ["this is the first sentence", "of my data.", "In fact, thats not true,", "but we are going to assume it", "is"]
在使用之前pipeline,我从模型输出中获取 logits,如下所示:
with torch.no_grad():
logits = model(**tokenized_test).logits
Run Code Online (Sandbox Code Playgroud)
现在我必须使用管道,所以这就是我获取模型输出的方式:
selected_model = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(selected_model)
model = AutoModelForSequenceClassification.from_pretrained(selected_model, num_labels=2)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
print(classifier(text))
Run Code Online (Sandbox Code Playgroud)
这给了我:
[{'label': 'POSITIVE', 'score': 0.9746173024177551}, {'label': 'NEGATIVE', 'score': 0.5020197629928589}, {'label': 'NEGATIVE', 'score': 0.9995120763778687}, {'label': 'NEGATIVE', 'score': 0.9802979826927185}, {'label': 'POSITIVE', 'score': 0.9274746775627136}]
我再也找不到“logits”字段了。
有没有办法得到 thelogits …
python sentiment-analysis huggingface-transformers huggingface large-language-model
在Huggingface 的Tokenizer文档中,调用函数接受 List[List[str]] 并表示:
\n\n\ntext (str, List[str], List[List[str]], 可选) \xe2\x80\x94 要编码的序列或一批序列。每个序列可以是一个字符串或字符串列表(预标记化字符串)。如果序列作为字符串列表(预标记化)提供,则必须设置 is_split_into_words=True (以消除一批序列的歧义)。
\n
如果我运行,一切都会正常运行:
\n test = ["hello this is a test", "that transforms a list of sentences", "into a list of list of sentences", "in order to emulate, in this case, two batches of the same lenght", "to be tokenized by the hf tokenizer for the defined model"]\n tokenizer = AutoTokenizer.from_pretrained(\'distilbert-base-uncased-finetuned-sst-2-english\')\n tokenized_test = tokenizer(text=test, padding="max_length", is_split_into_words=False, truncation=True, return_tensors="pt")\nRun Code Online (Sandbox Code Playgroud)\n但如果我尝试模拟批量句子:
\n …tokenize batch-processing pytorch huggingface-transformers huggingface-tokenizers
使用 peft 和 lora微调模型 ( https://huggingface.co/decapoda-research/llama-7b-hf ) 并保存为https://huggingface.co/lucas0/empath-llama-7b。Pipeline cannot infer suitable model classes from现在,当我尝试将它与 langchain 和 chroma vectordb 一起使用时,我得到:
from langchain.embeddings import HuggingFaceHubEmbeddings
from langchain import PromptTemplate, HuggingFaceHub, LLMChain
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.vectorstores import Chroma
repo_id = "sentence-transformers/all-mpnet-base-v2"
embedder = HuggingFaceHubEmbeddings(
repo_id=repo_id,
task="feature-extraction",
huggingfacehub_api_token="XXXXX",
)
comments = ["foo", "bar"]
embeddings = embedder.embed_documents(texts=comments)
docsearch = Chroma.from_texts(comments, embedder).as_retriever()
#docsearch = Chroma.from_documents(texts, embeddings)
llm = HuggingFaceHub(repo_id='lucas0/empath-llama-7b', huggingfacehub_api_token='XXXXX')
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch, return_source_documents=False) …Run Code Online (Sandbox Code Playgroud) python huggingface-transformers langchain large-language-model chromadb
我正在尝试运行一个简单的例子numpy.reshape().从.py文件调用它似乎不起作用,但是当我直接从Python终端尝试时,它完美地工作.
我只是这样做:
import numpy as np
a = np.arange(6)
print a
a.reshape((3,2))
print a
Run Code Online (Sandbox Code Playgroud)
它不会引发任何错误,但也不起作用!这是输出:
Lucass-MacBook-Pro:LSTM lucaslourenco$ python theClass.py
[0 1 2 3 4 5]
[0 1 2 3 4 5]
Run Code Online (Sandbox Code Playgroud)
在终端时:
>>> import numpy as np
>>> a = np.arange(6)
>>> a
array([0, 1, 2, 3, 4, 5])
>>> a.reshape((3,2))
array([[0, 1],
[2, 3],
[4, 5]])
Run Code Online (Sandbox Code Playgroud)
简单的方案?
python ×4
chromadb ×1
huggingface ×1
langchain ×1
llama-index ×1
numpy ×1
peft ×1
pytorch ×1
tokenize ×1