Huggingface Transformer Longformer 优化器警告 AdamW

use*_*622 5 python nlp huggingface-transformers

当我尝试从此页面运行代码时,出现以下警告。

/usr/local/lib/python3.7/dist-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,
Run Code Online (Sandbox Code Playgroud)

我非常困惑,因为代码似乎根本没有设置优化器。最有可能设置优化器的地方可能在下面,但我不知道如何更改优化器

# define the training arguments
training_args = TrainingArguments(
    output_dir = '/media/data_files/github/website_tutorials/results',
    num_train_epochs = 5,
    per_device_train_batch_size = 8,
    gradient_accumulation_steps = 8,    
    per_device_eval_batch_size= 16,
    evaluation_strategy = "epoch",
    disable_tqdm = False, 
    load_best_model_at_end=True,
    warmup_steps=200,
    weight_decay=0.01,
    logging_steps = 4,
    fp16 = True,
    logging_dir='/media/data_files/github/website_tutorials/logs',
    dataloader_num_workers = 0,
    run_name = 'longformer-classification-updated-rtx3090_paper_replication_2_warm'
)

# instantiate the trainer class and check for available devices
trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_data,
    eval_dataset=test_data
)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device
Run Code Online (Sandbox Code Playgroud)

我尝试了另一个变压器,例如distilbert-base-uncased使用相同的代码,但它似乎运行时没有任何警告。

  1. 此警告是否更具体longformer
  2. 我应该如何更改优化器?

Jer*_*ril 5

需要添加optim='adamw_torch',默认是optim='adamw_hf'

参考这里

您可以尝试以下操作:

# define the training arguments
training_args = TrainingArguments(
optim='adamw_torch',
# your training arguments
...
...
...
)
Run Code Online (Sandbox Code Playgroud)


小智 2

import torch_optimizer as optim
    
optim.AdamW(params, opt.learning_rate, (opt.optim_alpha, opt.optim_beta), opt.optim_epsilon, weight_decay=opt.weight_decay)
Run Code Online (Sandbox Code Playgroud)

可以这样使用。