小编doe*_*doe的帖子

为什么要采用 HuggingFace 的第一个隐藏状态进行序列分类（DistilBertForSequenceClassification）

在HuggingFace的最后几层序列分类中，他们将 Transformer 输出的序列长度的第一个隐藏状态用于分类。

hidden_state = distilbert_output[0]  # (bs, seq_len, dim) <-- transformer output
pooled_output = hidden_state[:, 0]  # (bs, dim)           <-- first hidden state
pooled_output = self.pre_classifier(pooled_output)  # (bs, dim)
pooled_output = nn.ReLU()(pooled_output)  # (bs, dim)
pooled_output = self.dropout(pooled_output)  # (bs, dim)
logits = self.classifier(pooled_output)  # (bs, dim)

Run Code Online (Sandbox Code Playgroud)

取第一个隐藏状态比最后一个、平均甚至使用 Flatten 层有什么好处？

time-series sequence text-classification tensorflow2.0 huggingface-transformers

doe*_*doe

2020 02-07

5
推荐指数

1
解决办法

939
查看次数