Huggingface 的 AutoTokenizer 中的 text_target 参数有什么作用？

Question

Huggingface 的 AutoTokenizer 中的 text_target 参数有什么作用？

Bet*_*tty 4 python huggingface-transformers huggingface

我正在遵循此处的指南： https: //huggingface.co/docs/transformers/v4.28.1/tasks/summarization \n指南中有一行如下：

\n

labels = tokenizer(text_target=examples["summary"], max_length=128, truncation=True)\n

Run Code Online (Sandbox Code Playgroud)\n

我不明白该text_target参数的功能。

\n

我尝试了以下代码，最后两行给出了完全相同的结果。

\n

from transformers import AutoTokenizer\ntokenizer = AutoTokenizer.from_pretrained(\'t5-small\')\ntext = "Weiter Verhandlung in Syrien."\ntokenizer(text_target=text, max_length=128, truncation=True)\ntokenizer(text, max_length=128, truncation=True)\n

Run Code Online (Sandbox Code Playgroud)\n

文档只是说text_target (str, List[str], List[List[str]], optional) \xe2\x80\x94 The sequence or batch of sequences to be encoded as target texts.我不太明白。是否在某些情况下设置 text_target会产生不同的结果？

\n

Answer 1

cro*_*oik 6

有时需要看一下代码：

if text is None and text_target is None:
    raise ValueError("You need to specify either `text` or `text_target`.")
if text is not None:
    # The context manager will send the inputs as normal texts and not text_target, but we shouldn't change the
    # input mode in this case.
    if not self._in_target_context_manager:
        self._switch_to_input_mode()
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
if text_target is not None:
    self._switch_to_target_mode()
    target_encodings = self._call_one(text=text_target, text_pair=text_pair_target, **all_kwargs)
# Leave back tokenizer in input mode
self._switch_to_input_mode()

if text_target is None:
    return encodings
elif text is None:
    return target_encodings
else:
    encodings["labels"] = target_encodings["input_ids"]
    return encodings

Run Code Online (Sandbox Code Playgroud)

正如您在上面的代码片段中看到的，两者text都text_target被传递给self._call_one()它们以对其进行编码（请注意，它text_target是作为text参数传递的）。这意味着只要不做任何特殊的事情，相同字符串的编码将是text相同的。text_target_switch_to_target_mode()

函数末尾的条件回答了您的问题：

当您仅提供时，text您将检索它的编码。
当您仅提供时，text_target您将检索它的编码。
当您提供时text，text_target您将检索编码text和令牌 IDtext_target作为密钥的值labels。

说实话，我认为实现有点不直观。我希望传递text_target将返回一个仅包含labels密钥的对象。我认为他们希望保持输出对象和相应的文档简单，因此选择了这种实现。或者有一个我不知道的模型实际上是有意义的。

归档时间：	2 年，8 月前
查看次数：	3940 次
最近记录：	2 年，2 月前