Nas*_*sin 7 nlp huggingface-transformers
我使用经过微调的 Huggingface 模型(在我公司的数据上)和 TextClassificationPipeline来进行类别预测。现在,此预测的标签Pipeline
默认为LABEL_0
,LABEL_1
依此类推。有没有办法向TextClassificationPipeline
对象提供标签映射,以便输出可以反映相同的结果?
环境:
- 张量流==2.3.1
- 变形金刚==4.3.2
示例代码:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
from transformers import TextClassificationPipeline, TFAutoModelForSequenceClassification, AutoTokenizer
MODEL_DIR = "path\to\my\fine-tuned\model"
# Feature extraction pipeline
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
pipeline = TextClassificationPipeline(model=model,
tokenizer=tokenizer,
framework='tf',
device=0)
result = pipeline("It was a good watch. But a little boring.")[0]
Run Code Online (Sandbox Code Playgroud)
输出:
In [2]: result
Out[2]: {'label': 'LABEL_1', 'score': 0.8864616751670837}
Run Code Online (Sandbox Code Playgroud)
Nas*_*sin 14
最简单的方法是添加这样的映射,即编辑模型的 config.json 以包含:id2label
字段,如下所示:
{
"_name_or_path": "distilbert-base-uncased",
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"id2label": [
"negative",
"positive"
],
"attention_dropout": 0.1,
.
.
}
Run Code Online (Sandbox Code Playgroud)
设置此映射的代码内方法是在调用id2label
中添加参数,from_pretrained
如下所示:
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR, id2label={0: 'negative', 1: 'positive'})
Run Code Online (Sandbox Code Playgroud)
这是我提出的Github 问题,以便将其添加到 Transformers.XForSequenceClassification 的文档中。