Ари*_*дел 10 google-colaboratory huggingface-transformers
我正在尝试探索T5
这是代码
!pip install transformers
from transformers import T5Tokenizer, T5ForConditionalGeneration
qa_input = """question: What is the capital of Syria? context: The name "Syria" historically referred to a wider region,
broadly synonymous with the Levant, and known in Arabic as al-Sham. The modern state encompasses the sites of several ancient
kingdoms and empires, including the Eblan civilization of the 3rd millennium BC. Aleppo and the capital city Damascus are
among the oldest continuously inhabited cities in the world."""
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')
input_ids = tokenizer.encode(qa_input, return_tensors="pt") # Batch size 1
outputs = model.generate(input_ids)
output_str = tokenizer.decode(outputs.reshape(-1))
Run Code Online (Sandbox Code Playgroud)
我收到这个错误:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-2-8d24c6a196e4> in <module>()
5 kingdoms and empires, including the Eblan civilization of the 3rd millennium BC. Aleppo and the capital city Damascus are
6 among the oldest continuously inhabited cities in the world."""
----> 7 tokenizer = T5Tokenizer.from_pretrained('t5-small')
8 model = T5ForConditionalGeneration.from_pretrained('t5-small')
9 input_ids = tokenizer.encode(qa_input, return_tensors="pt") # Batch size 1
1 frames
/usr/local/lib/python3.6/dist-packages/transformers/file_utils.py in requires_sentencepiece(obj)
521 name = obj.__name__ if hasattr(obj, "__name__") else obj.__class__.__name__
522 if not is_sentencepiece_available():
--> 523 raise ImportError(SENTENCEPIECE_IMPORT_ERROR.format(name))
524
525
ImportError:
T5Tokenizer requires the SentencePiece library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones
that match your environment.
--------------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)
之后我按照建议安装句子库,如下所示:
!pip install transformers
!pip install sentencepiece
from transformers import T5Tokenizer, T5ForConditionalGeneration
qa_input = """question: What is the capital of Syria? context: The name "Syria" historically referred to a wider region,
broadly synonymous with the Levant, and known in Arabic as al-Sham. The modern state encompasses the sites of several ancient
kingdoms and empires, including the Eblan civilization of the 3rd millennium BC. Aleppo and the capital city Damascus are
among the oldest continuously inhabited cities in the world."""
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')
input_ids = tokenizer.encode(qa_input, return_tensors="pt") # Batch size 1
outputs = model.generate(input_ids)
output_str = tokenizer.decode(outputs.reshape(-1))
Run Code Online (Sandbox Code Playgroud)
但我还有另一个问题:
初始化 T5ForConditionalGeneration 时未使用 t5-small 处模型检查点的一些权重:['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']
- 如果您从在另一个任务或其他架构上训练的模型的检查点初始化 T5ForConditionalGeneration(例如,从 BertForPreTraining 模型初始化 BertForSequenceClassification 模型),这是预期的。
- 如果您从希望完全相同的模型的检查点初始化 T5ForConditionalGeneration(从 BertForSequenceClassification 模型初始化 BertForSequenceClassification 模型),则不会出现这种情况。
所以我不明白发生了什么事,有什么解释吗?
我使用了这两个命令,这对我来说效果很好!
!pip install datasets transformers[sentencepiece]
!pip install sentencepiece
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
24729 次 |
最近记录: |