Transformers v4.x：将慢速分词器转换为快速分词器

Question

Transformers v4.x：将慢速分词器转换为快速分词器

Mig*_*ejo 9 python nlp huggingface-transformers huggingface-tokenizers

我正在关注变压器的预训练模型xlm-roberta-large-xnli示例

from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="joeddav/xlm-roberta-large-xnli")

Run Code Online (Sandbox Code Playgroud)

我收到以下错误

ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a `tokenizers` library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

Run Code Online (Sandbox Code Playgroud)

我用的是变形金刚版 '4.1.1'

Answer 1

小智 15

如果您在 Google 协作中：

恢复出厂设置运行时间。
使用以下命令升级 pip (pip install --upgrade pip)
使用以下命令安装句子（！pip install句子）

Answer 2

Mig*_*ejo 14

根据 Transformers v4.0.0 release，sentencepiece作为必需的依赖项被删除。这意味着

“依赖 SentencePiece 库的分词器将无法用于标准转换器安装”

包括XLMRobertaTokenizer. 但是，sentencepiece可以作为额外的依赖项安装

pip install transformers[sentencepiece]

Run Code Online (Sandbox Code Playgroud)

或者

pip install sentencepiece

Run Code Online (Sandbox Code Playgroud)

如果您已经安装了变压器。

pip install Sentencepiece 然后内核/运行时重新启动解决了该问题。 (4认同)

Answer 3

thr*_*dhn 6

下面的代码在 colab 笔记本中对我有用

!pip install transformers[sentencepiece]

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，11 月前
查看次数：	4379 次
最近记录：	4 年，2 月前