我是 PyTorch 的新手,最近我一直在尝试使用 Transformers。我正在使用 HuggingFace 提供的预训练分词器。
我成功下载并运行它们。但如果我尝试保存它们并再次加载,则会发生一些错误。
如果我用来 AutoTokenizer.from_pretrained下载分词器,那么它就可以工作。
[1]: tokenizer = AutoTokenizer.from_pretrained('distilroberta-base')
text = "Hello there"
enc = tokenizer.encode_plus(text)
enc.keys()
Out[1]: dict_keys(['input_ids', 'attention_mask'])
Run Code Online (Sandbox Code Playgroud)
但是,如果我使用保存它tokenizer.save_pretrained("distilroberta-tokenizer")并尝试在本地加载它,则会失败。
[2]: tmp = AutoTokenizer.from_pretrained('distilroberta-tokenizer')
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
238 resume_download=resume_download,
--> 239 local_files_only=local_files_only,
240 )
/opt/conda/lib/python3.7/site-packages/transformers/file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, local_files_only)
266 # File, but it doesn't exist.
--> 267 raise EnvironmentError("file {} not found".format(url_or_filename))
268 else:
OSError: file …Run Code Online (Sandbox Code Playgroud) python deep-learning pytorch huggingface-transformers huggingface-tokenizers