如何获得 Huggingface.transformers Trainer 每个时期或步骤的准确性?

Cpt*_*aas 25 python logging tensorflow huggingface-transformers

我正在将 Huggingface Trainer 与BertForSequenceClassification.from_pretrained("bert-base-uncased")模型一起使用。

简化后,它看起来像这样:

model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

training_args = TrainingArguments(
        output_dir="bert_results",
        num_train_epochs=3,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=32,
        warmup_steps=500,
        weight_decay=0.01,
        logging_dir="bert_results/logs",
        logging_steps=10
        )

trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=compute_metrics
        )
Run Code Online (Sandbox Code Playgroud)

日志包含每 10 步的损失,但我似乎无法找到训练的准确性。有谁知道如何获得准确性,例如通过更改记录器的详细程度?我似乎在网上找不到任何有关它的信息。

小智 12

您可以加载准确性指标并使其与您的compute_metrics函数配合使用。举个例子,它会是这样的:

from datasets import load_metric
metric = load_metric('accuracy')

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)
Run Code Online (Sandbox Code Playgroud)

该函数示例compute_metrics基于Hugging Face 的文本分类教程。它在我的测试中有效。


sid*_*491 12

我遇到了同样的问题,我通过添加一个自定义回调解决了这个问题,该回调在每个回调末尾调用带有 train_dataset 的评估()方法。

class CustomCallback(TrainerCallback):
    
    def __init__(self, trainer) -> None:
        super().__init__()
        self._trainer = trainer
    
    def on_epoch_end(self, args, state, control, **kwargs):
        if control.should_evaluate:
            control_copy = deepcopy(control)
            self._trainer.evaluate(eval_dataset=self._trainer.train_dataset, metric_key_prefix="train")
            return control_copy

trainer = Trainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=valid_dataset,          # evaluation dataset
    compute_metrics=compute_metrics,     # the callback that computes metrics of interest
    tokenizer=tokenizer
)
trainer.add_callback(CustomCallback(trainer)) 
train = trainer.train()
Run Code Online (Sandbox Code Playgroud)

这给出了如下的训练指标:

{'train_loss': 0.7159061431884766, 'train_accuracy': 0.4, 'train_f1': 0.5714285714285715, 'train_runtime': 6.2973, 'train_samples_per_second': 2.382, 'train_steps_per_second': 0.159, 'epoch': 1.0}
{'eval_loss': 0.8529007434844971, 'eval_accuracy': 0.0, 'eval_f1': 0.0, 'eval_runtime': 2.0739, 'eval_samples_per_second': 0.964, 'eval_steps_per_second': 0.482, 'epoch': 1.0}
Run Code Online (Sandbox Code Playgroud)


获得训练准确性的另一种方法是扩展基本 Trainer 类并覆盖compute_loss() 方法,如下所示:

class CustomTrainer(Trainer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        
    def compute_loss(self, model, inputs, return_outputs=False):
        """
        How the loss is computed by Trainer. By default, all models return the loss in the first element.
        Subclass and override for custom behavior.
        """
        if self.label_smoother is not None and "labels" in inputs:
            labels = inputs.pop("labels")
        else:
            labels = None
        outputs = model(**inputs)

        # code for calculating accuracy
        if "labels" in inputs:
            preds = outputs.logits.detach()
            acc1 = accuracy_score(inputs.labels.reshape(1, len(inputs.labels))[0], preds.argmax(axis=1))
            self.log({'accuracy_score': acc1})
            acc = (
                (preds.argmax(axis=-1) == inputs.labels.reshape(1, len(inputs.labels))[0])
                .type(torch.float)
                .mean()
                .item()
            )
            self.log({"train_accuracy": acc})
        # end code for calculating accuracy
                    
        # Save past state if it exists
        # TODO: this needs to be fixed and made cleaner later.
        if self.args.past_index >= 0:
            self._past = outputs[self.args.past_index]

        if labels is not None:
            loss = self.label_smoother(outputs, labels)
        else:
            # We don't use .loss here since the model may return tuples instead of ModelOutput.
            loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]

        return (loss, outputs) if return_outputs else loss
Run Code Online (Sandbox Code Playgroud)

然后使用 CustomTrainer 代替训练器,如下所示:

trainer = CustomTrainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=valid_dataset,          # evaluation dataset
    compute_metrics=compute_metrics,     # the callback that computes metrics of interest
    tokenizer=tokenizer
)
Run Code Online (Sandbox Code Playgroud)

  • @MAC,我们正在将“control”对象从训练器类复制到“control_copy”中,以便稍后返回,据我记得直接更改“control”对象给了我一些错误。 (2认同)