除了第一次之外,使用 Huggingface Transformer 进行多次训练将给出完全相同的结果

SMM*_*iSP 6 python nlp machine-learning deep-learning huggingface-transformers

我有一个函数,可以从 Huggingface 加载预训练模型并对其进行微调以进行情感分析,然后计算 F1 分数并返回结果。问题是,当我使用完全相同的参数多次调用此函数时,它将给出与预期完全相同的度量分数,除了第一次不同之外,这怎么可能?

这是我的函数,是根据huggingface中的本教程编写的:

import uuid

import numpy as np

from datasets import (
    load_dataset,
    load_metric,
    DatasetDict,
    concatenate_datasets
)

from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    DataCollatorWithPadding,
    TrainingArguments,
    Trainer,
)

CHECKPOINT = "distilbert-base-uncased"
SAVING_FOLDER = "sst2"
def custom_train(datasets, checkpoint=CHECKPOINT, saving_folder=SAVING_FOLDER):

    model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    
    def tokenize_function(example):
        return tokenizer(example["sentence"], truncation=True)

    tokenized_datasets = datasets.map(tokenize_function, batched=True)
    data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

    saving_folder = f"{SAVING_FOLDER}_{str(uuid.uuid1())}"
    training_args = TrainingArguments(saving_folder)

    trainer = Trainer(
        model,
        training_args,
        train_dataset=tokenized_datasets["train"],
        eval_dataset=tokenized_datasets["validation"],
        data_collator=data_collator,
        tokenizer=tokenizer,
    )
    
    trainer.train()
    
    predictions = trainer.predict(tokenized_datasets["test"])
    print(predictions.predictions.shape, predictions.label_ids.shape)
    preds = np.argmax(predictions.predictions, axis=-1)
    
    metric_fun = load_metric("f1")
    metric_result = metric_fun.compute(predictions=preds, references=predictions.label_ids)
    
    return metric_result
Run Code Online (Sandbox Code Playgroud)

然后我将使用相同的数据集多次运行此函数,并每次附加返回的 F1 分数的结果:

raw_datasets = load_dataset("glue", "sst2")

small_datasets = DatasetDict({
    "train": raw_datasets["train"].select(range(100)).flatten_indices(),
    "validation": raw_datasets["validation"].select(range(100)).flatten_indices(),
    "test": raw_datasets["validation"].select(range(100, 200)).flatten_indices(),
})

results = []
for i in range(4):
    result = custom_train(small_datasets)
    results.append(result)
Run Code Online (Sandbox Code Playgroud)

然后当我检查结果列表时:

[{'f1': 0.7755102040816325}, {'f1': 0.5797101449275361}, {'f1': 0.5797101449275361}, {'f1': 0.5797101449275361}]
Run Code Online (Sandbox Code Playgroud)

可能会想到的是,当我加载预先训练的模型时,头部将使用随机权重进行初始化,这就是结果不同的原因,如果是这种情况,为什么只有第一个不同而其他完全一样吗?

SMM*_*iSP 12

Sylvain Gugger在这里回答了这个问题:https://discuss.huggingface.co/t/multiple-training-will-give-exactly-the-same-result- except-for-the-first-time/8493

\n
\n

您需要在实例化模型之前设置种子,否则随机头不会以相同的方式初始化,这\xe2\x80\x99s为什么第一次运行总是不同。\n后续运行都是相同的,因为种子有已由 Trainer 在 train 方法中设置。\n要设置种子:

\n
\n
from transformers import set_seed\n\nset_seed(42)\n
Run Code Online (Sandbox Code Playgroud)\n