Web*_*ang 1 python-3.x fast-ai
我是 fastai 的初学者,尝试构建一个模型,参考Using RoBERTa with fast.ai for NLP。
我试图自定义分词器(如下面的代码):
from fastai.text import *
from fastai.metrics import *
from transformers import RobertaTokenizer
class FastAiRobertaTokenizer(BaseTokenizer):
"""Wrapper around RobertaTokenizer to be compatible with fastai"""
def __init__(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs):
self._pretrained_tokenizer = tokenizer
self.max_seq_len = max_seq_len
def __call__(self, *args, **kwargs):
return self
def tokenizer(self, t:str) -> List[str]:
"""Adds Roberta bos and eos tokens and limits the maximum sequence length"""
return [config.start_tok] + self._pretrained_tokenizer.tokenize(t)[:self.max_seq_len - 2] + [config.end_tok]
Run Code Online (Sandbox Code Playgroud)
但收到错误消息:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-6-41070aae72d1> in <module>
----> 1 class FastAiRobertaTokenizer(BaseTokenizer):
2 """Wrapper around RobertaTokenizer to be compatible with fastai"""
3 def __init__(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs):
4 self._pretrained_tokenizer = tokenizer
5 self.max_seq_len = max_seq_len
NameError: name 'BaseTokenizer' is not defined
Run Code Online (Sandbox Code Playgroud)
以前有人遇到过同样的问题吗?
哦,我终于明白我应该from fastai.text import *改成from fastai.text.all import *。没有NameError: name 'BaseTokenizer' is not defined留下任何错误消息。
| 归档时间: |
|
| 查看次数: |
2563 次 |
| 最近记录: |