使用 Huggingface TextClassificationPipeline 时如何设置标签名称?

Nas*_*sin 7 nlp huggingface-transformers

我使用经过微调的 Huggingface 模型(在我公司的数据上)和 TextClassificationPipeline进行类别预测。现在,此预测的标签Pipeline默认为LABEL_0LABEL_1依此类推。有没有办法向TextClassificationPipeline对象提供标签映射,以便输出可以反映相同的结果?

环境:

  • 张量流==2.3.1
  • 变形金刚==4.3.2

示例代码:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # or any {'0', '1', '2'}

from transformers import TextClassificationPipeline, TFAutoModelForSequenceClassification, AutoTokenizer

MODEL_DIR = "path\to\my\fine-tuned\model"

# Feature extraction pipeline
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)

pipeline = TextClassificationPipeline(model=model,
                                      tokenizer=tokenizer,
                                      framework='tf',
                                      device=0)

result = pipeline("It was a good watch. But a little boring.")[0]
Run Code Online (Sandbox Code Playgroud)

输出:

In [2]: result
Out[2]: {'label': 'LABEL_1', 'score': 0.8864616751670837}
Run Code Online (Sandbox Code Playgroud)

Nas*_*sin 14

最简单的方法是添加这样的映射,即编辑模型的 config.json 以包含:id2label字段,如下所示:

{
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "id2label": [
    "negative",
    "positive"
  ],
  "attention_dropout": 0.1,
  .
  .
}

Run Code Online (Sandbox Code Playgroud)

设置此映射的代码内方法是在调用id2label中添加参数,from_pretrained如下所示:

model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR, id2label={0: 'negative', 1: 'positive'})
Run Code Online (Sandbox Code Playgroud)

这是我提出的Github 问题,以便将其添加到 Transformers.XForSequenceClassification 的文档中。