fastai.text NameError:名称“BaseTokenizer”未定义

Web*_*ang 1 python-3.x fast-ai

我是 fastai 的初学者,尝试构建一个模型,参考Using RoBERTa with fast.ai for NLP

我试图自定义分词器(如下面的代码):

from fastai.text import *
from fastai.metrics import *
from transformers import RobertaTokenizer

class FastAiRobertaTokenizer(BaseTokenizer):
    """Wrapper around RobertaTokenizer to be compatible with fastai"""
    def __init__(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs): 
        self._pretrained_tokenizer = tokenizer
        self.max_seq_len = max_seq_len 
    def __call__(self, *args, **kwargs): 
        return self 
    def tokenizer(self, t:str) -> List[str]: 
        """Adds Roberta bos and eos tokens and limits the maximum sequence length""" 
        return [config.start_tok] + self._pretrained_tokenizer.tokenize(t)[:self.max_seq_len - 2] + [config.end_tok]
Run Code Online (Sandbox Code Playgroud)

但收到错误消息:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-41070aae72d1> in <module>
----> 1 class FastAiRobertaTokenizer(BaseTokenizer):
      2     """Wrapper around RobertaTokenizer to be compatible with fastai"""
      3     def __init__(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs):
      4         self._pretrained_tokenizer = tokenizer
      5         self.max_seq_len = max_seq_len

NameError: name 'BaseTokenizer' is not defined
Run Code Online (Sandbox Code Playgroud)
  • 快泰版本:2.1.8
  • 火炬版本:1.7.1
  • 变形金刚版本:3.4.0

以前有人遇到过同样的问题吗?

Web*_*ang 7

哦,我终于明白我应该from fastai.text import *改成from fastai.text.all import *。没有NameError: name 'BaseTokenizer' is not defined留下任何错误消息。