我是 fastai 的初学者,尝试构建一个模型,参考Using RoBERTa with fast.ai for NLP。
我试图自定义分词器(如下面的代码):
from fastai.text import *
from fastai.metrics import *
from transformers import RobertaTokenizer
class FastAiRobertaTokenizer(BaseTokenizer):
"""Wrapper around RobertaTokenizer to be compatible with fastai"""
def __init__(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs):
self._pretrained_tokenizer = tokenizer
self.max_seq_len = max_seq_len
def __call__(self, *args, **kwargs):
return self
def tokenizer(self, t:str) -> List[str]:
"""Adds Roberta bos and eos tokens and limits the maximum sequence length"""
return [config.start_tok] + self._pretrained_tokenizer.tokenize(t)[:self.max_seq_len - 2] + [config.end_tok]
Run Code Online (Sandbox Code Playgroud)
但收到错误消息:
---------------------------------------------------------------------------
NameError …Run Code Online (Sandbox Code Playgroud)