T5Tokenizer 需要 SentencePiece 库,但在您的环境中找不到它

Ари*_*дел 10 google-colaboratory huggingface-transformers

我正在尝试探索T5

这是代码

!pip install transformers
from transformers import T5Tokenizer, T5ForConditionalGeneration
qa_input = """question: What is the capital of Syria? context: The name "Syria" historically referred to a wider region,
 broadly synonymous with the Levant, and known in Arabic as al-Sham. The modern state encompasses the sites of several ancient 
 kingdoms and empires, including the Eblan civilization of the 3rd millennium BC. Aleppo and the capital city Damascus are 
 among the oldest continuously inhabited cities in the world."""
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')
input_ids = tokenizer.encode(qa_input, return_tensors="pt")  # Batch size 1
outputs = model.generate(input_ids)
output_str = tokenizer.decode(outputs.reshape(-1))

Run Code Online (Sandbox Code Playgroud)

我收到这个错误:

---------------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

<ipython-input-2-8d24c6a196e4> in <module>()
      5  kingdoms and empires, including the Eblan civilization of the 3rd millennium BC. Aleppo and the capital city Damascus are
      6  among the oldest continuously inhabited cities in the world."""
----> 7 tokenizer = T5Tokenizer.from_pretrained('t5-small')
      8 model = T5ForConditionalGeneration.from_pretrained('t5-small')
      9 input_ids = tokenizer.encode(qa_input, return_tensors="pt")  # Batch size 1

1 frames

/usr/local/lib/python3.6/dist-packages/transformers/file_utils.py in requires_sentencepiece(obj)
    521     name = obj.__name__ if hasattr(obj, "__name__") else obj.__class__.__name__
    522     if not is_sentencepiece_available():
--> 523         raise ImportError(SENTENCEPIECE_IMPORT_ERROR.format(name))
    524 
    525 

ImportError: 
T5Tokenizer requires the SentencePiece library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones
that match your environment.


--------------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

之后我按照建议安装句子库,如下所示:

!pip install transformers
!pip install sentencepiece

from transformers import T5Tokenizer, T5ForConditionalGeneration
qa_input = """question: What is the capital of Syria? context: The name "Syria" historically referred to a wider region,
 broadly synonymous with the Levant, and known in Arabic as al-Sham. The modern state encompasses the sites of several ancient 
 kingdoms and empires, including the Eblan civilization of the 3rd millennium BC. Aleppo and the capital city Damascus are 
 among the oldest continuously inhabited cities in the world."""
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')
input_ids = tokenizer.encode(qa_input, return_tensors="pt")  # Batch size 1
outputs = model.generate(input_ids)
output_str = tokenizer.decode(outputs.reshape(-1))

Run Code Online (Sandbox Code Playgroud)

但我还有另一个问题:

初始化 T5ForConditionalGeneration 时未使用 t5-small 处模型检查点的一些权重:['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']

  • 如果您从在另一个任务或其他架构上训练的模型的检查点初始化 T5ForConditionalGeneration(例如,从 BertForPreTraining 模型初始化 BertForSequenceClassification 模型),这是预期的。
  • 如果您从希望完全相同的模型的检查点初始化 T5ForConditionalGeneration(从 BertForSequenceClassification 模型初始化 BertForSequenceClassification 模型),则不会出现这种情况。

所以我不明白发生了什么事,有什么解释吗?

Ara*_*d R 8

我使用了这两个命令,这对我来说效果很好!

!pip install datasets transformers[sentencepiece]
!pip install sentencepiece
Run Code Online (Sandbox Code Playgroud)


Ber*_*abi 1

这不是问题。我还观察了第二个输出。这只是图书馆显示的警告。您解决了实际问题。不用担心警告。