如何通过 HuggingFace 的文本分类管道获取模型的 logits？

Question

如何通过 HuggingFace 的文本分类管道获取模型的 logits？

Luc*_*edo 5 python sentiment-analysis huggingface-transformers huggingface large-language-model

我需要使用它pipeline来从数据集上的模型中获得标记化和推理distilbert-base-uncased-finetuned-sst-2-english。

我的数据是一个句子列表，出于娱乐目的，我们可以假设它是：

texts = ["this is the first sentence", "of my data.", "In fact, thats not true,", "but we are going to assume it", "is"]

在使用之前pipeline，我从模型输出中获取 logits，如下所示：

with torch.no_grad():
     logits = model(**tokenized_test).logits

Run Code Online (Sandbox Code Playgroud)

现在我必须使用管道，所以这就是我获取模型输出的方式：

 selected_model = "distilbert-base-uncased-finetuned-sst-2-english"
 tokenizer = AutoTokenizer.from_pretrained(selected_model)
 model = AutoModelForSequenceClassification.from_pretrained(selected_model, num_labels=2)
 classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
 print(classifier(text))

Run Code Online (Sandbox Code Playgroud)

这给了我：

[{'label': 'POSITIVE', 'score': 0.9746173024177551}, {'label': 'NEGATIVE', 'score': 0.5020197629928589}, {'label': 'NEGATIVE', 'score': 0.9995120763778687}, {'label': 'NEGATIVE', 'score': 0.9802979826927185}, {'label': 'POSITIVE', 'score': 0.9274746775627136}]

我再也找不到“logits”字段了。

有没有办法得到 thelogits而不是labeland score？自定义管道是否是最好和/或最简单的方法？

Answer 1

alv*_*vas 6

当您使用默认值时pipeline，后处理函数通常会采用softmax，例如

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')
model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')


text = ['hello this is a test',
 'that transforms a list of sentences',
 'into a list of list of sentences',
 'in order to emulate, in this case, two batches of the same lenght',
 'to be tokenized by the hf tokenizer for the defined model']

classifier(text, batch_size=2, truncation="only_first")

Run Code Online (Sandbox Code Playgroud)

[出去]：

[{'label': 'NEGATIVE', 'score': 0.9379090666770935},
 {'label': 'POSITIVE', 'score': 0.9990271329879761},
 {'label': 'NEGATIVE', 'score': 0.9726701378822327},
 {'label': 'NEGATIVE', 'score': 0.9965035915374756},
 {'label': 'NEGATIVE', 'score': 0.9913086891174316}]

Run Code Online (Sandbox Code Playgroud)

因此，您想要的是通过从管道继承来重载后处理逻辑。

要检查分类器继承哪个管道，请执行以下操作：

classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
type(classifier)

Run Code Online (Sandbox Code Playgroud)

[出去]：

transformers.pipelines.text_classification.TextClassificationPipeline

Run Code Online (Sandbox Code Playgroud)

现在您知道了要使用的任务管道的父类，现在您可以执行此操作并仍然享受预编码批处理的好处TextClassificationPipeline：

from transformers import TextClassificationPipeline

class MarioThePlumber(TextClassificationPipeline):
    def postprocess(self, model_outputs):
        best_class = model_outputs["logits"]
        return best_class

pipe = MarioThePlumber(model=model, tokenizer=tokenizer)

pipe(text, batch_size=2, truncation="only_first")

Run Code Online (Sandbox Code Playgroud)

[出去]：

[tensor([[ 1.5094, -1.2056]]),
 tensor([[-3.4114,  3.5229]]),
 tensor([[ 1.8835, -1.6886]]),
 tensor([[ 3.0780, -2.5745]]),
 tensor([[ 2.5383, -2.1984]])]

Run Code Online (Sandbox Code Playgroud)

归档时间：	2 年，5 月前
查看次数：	1360 次
最近记录：	2 年，4 月前