我使用经过微调的 Huggingface 模型(在我公司的数据上)和 TextClassificationPipeline来进行类别预测。现在,此预测的标签Pipeline默认为LABEL_0,LABEL_1依此类推。有没有办法向TextClassificationPipeline对象提供标签映射,以便输出可以反映相同的结果?
环境:
- 张量流==2.3.1
- 变形金刚==4.3.2
示例代码:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
from transformers import TextClassificationPipeline, TFAutoModelForSequenceClassification, AutoTokenizer
MODEL_DIR = "path\to\my\fine-tuned\model"
# Feature extraction pipeline
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
pipeline = TextClassificationPipeline(model=model,
tokenizer=tokenizer,
framework='tf',
device=0)
result = pipeline("It was a good watch. But a little boring.")[0]
Run Code Online (Sandbox Code Playgroud)
输出:
In [2]: result
Out[2]: {'label': 'LABEL_1', 'score': 0.8864616751670837}
Run Code Online (Sandbox Code Playgroud) 我遵循了下面给出的基本示例,来自:https ://huggingface.co/transformers/training.html
from transformers import TFBertForSequenceClassification, TFTrainer, TFTrainingArguments
model = TFBertForSequenceClassification.from_pretrained("bert-large-uncased")
training_args = TFTrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total # of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
)
trainer = TFTrainer(
model=model, # the instantiated Transformers model to be trained
args=training_args, # training arguments, defined …Run Code Online (Sandbox Code Playgroud)