使用 Huggingface/transformers (torch) 输出 bert-base-uncased 的注意力

Question

使用 Huggingface/transformers (torch) 输出 bert-base-uncased 的注意力

Bjö*_*örn 9 python attention-model bert-language-model huggingface-transformers

我正在关注一篇关于基于 BERT 的词汇替换的论文（特别是尝试实现等式（2）——如果有人已经实现了整篇论文，那就太好了）。因此，我想要获得最后一个隐藏层（我唯一不确定的是输出中各层的顺序：最后一个第一个还是第一个第一个？）以及来自基本 BERT 模型（bert-base-uncased）的注意力。

然而，我有点不确定Huggingface/transformers 库是否真的输出了 bert-base-uncased 的注意力（我使用的是 torch，但我愿意使用 TF）？

根据我读到的内容，我预计会得到一个元组（logits、hidden_states、attentions），但通过下面的示例（例如在 Google Colab 中运行），我得到的长度是 2。

我是否误解了我所得到的或以错误的方式处理这件事？我做了明显的测试并使用了output_attention=False（output_attention=True虽然output_hidden_states=True确实似乎按预期添加了隐藏状态）并且我得到的输出没有任何变化。这显然是我对图书馆的理解的一个不好的迹象，或者表明存在问题。

import numpy as np
import torch
!pip install transformers

from transformers import (AutoModelWithLMHead, 
                          AutoTokenizer, 
                          BertConfig)

bert_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
config = BertConfig.from_pretrained('bert-base-uncased', output_hidden_states=True, output_attention=True) # Nothign changes, when I switch to output_attention=False
bert_model = AutoModelWithLMHead.from_config(config)

sequence = "We went to an ice cream cafe and had a chocolate ice cream."
bert_tokenized_sequence = bert_tokenizer.tokenize(sequence)

indexed_tokens = bert_tokenizer.encode(bert_tokenized_sequence, return_tensors='pt')

predictions = bert_model(indexed_tokens)

########## Now let's have a look at what the predictions look like #############
print(len(predictions)) # Length is 2, I expected 3: logits, hidden_layers, attention

print(predictions[0].shape) # torch.Size([1, 16, 30522]) - seems to be logits (shape is 1 x sequence length x vocabulary

print(len(predictions[1])) # Length is 13 - the hidden layers?! There are meant to be 12, right? Is one somehow the attention?

for k in range(len(predictions[1])):
  print(predictions[1][k].shape) # These all seem to be torch.Size([1, 16, 768]), so presumably the hidden layers?

Run Code Online (Sandbox Code Playgroud)

受公认答案启发，对最终有效的解释

import numpy as np
import torch
!pip install transformers

from transformers import BertModel, BertConfig, BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
config = BertConfig.from_pretrained('bert-base-uncased', output_hidden_states=True, output_attentions=True)
model = BertModel.from_pretrained('bert-base-uncased', config=config)
sequence = "We went to an ice cream cafe and had a chocolate ice cream."
tokenized_sequence = tokenizer.tokenize(sequence)
indexed_tokens = tokenizer.encode(tokenized_sequence, return_tensors='pt'
enter code here`outputs = model(indexed_tokens)
print( len(outputs) ) # 4 
print( outputs[0].shape ) #1, 16, 768 
print( outputs[1].shape ) # 1, 768
print( len(outputs[2]) ) # 13  = input embedding (index 0) + 12 hidden layers (indices 1 to 12)
print( outputs[2][0].shape ) # for each of these 13: 1,16,768 = input sequence, index of each input id in sequence, size of hidden layer
print( len(outputs[3]) ) # 12 (=attenion for each layer)
print( outputs[3][0].shape ) # 0 index = first layer, 1,12,16,16 = , layer, index of each input id in sequence, index of each input id in sequence

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 7

我认为现在回答已经太晚了，但是随着huggingface变形金刚的更新，我想我们可以使用这个

config = BertConfig.from_pretrained('bert-base-uncased', 
output_hidden_states=True, output_attentions=True)  
bert_model = BertModel.from_pretrained('bert-base-uncased', 
config=config)

with torch.no_grad():
  out = bert_model(input_ids)
  last_hidden_states = out.last_hidden_state
  pooler_output = out.pooler_output
  hidden_states = out.hidden_states
  attentions = out.attentions

Run Code Online (Sandbox Code Playgroud)

Answer 2

Jin*_*ich 3

原因是您使用的AutoModelWithLMHead是实际模型的包装器。它调用 BERT 模型（即的实例BERTModel），然后使用嵌入矩阵作为单词预测的权重矩阵。在这之间，底层模型确实返回了注意力，但包装器并不关心，只返回了 logits。

您可以通过调用直接获取 BERT 模型AutoModel。请注意，该模型不返回逻辑，而是返回隐藏状态。

bert_model = AutoModel.from_config(config)

Run Code Online (Sandbox Code Playgroud)

或者您可以BertWithLMHead通过调用从对象中获取它：

wrapped_model = bert_model.base_model

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，9 月前
查看次数：	11775 次
最近记录：	4 年，11 月前