将 Hugging Face Transformer 文本嵌入转换回文本

Question

将 Hugging Face Transformer 文本嵌入转换回文本

joh*_*mon 8 python pipeline tokenize huggingface-transformers

有没有一种方法可以将 Hugging Face Transformer 嵌入转换回文本？

假设我使用 Hugging Face 的ClipTextModel使用以下方法创建了文本嵌入：

import torch
from transformers import CLIPTokenizer, CLIPTextModel

class_list = [
    "i love going home and playing with my wife and kids",
    "i love going home",
    "playing with my wife and kids", 
    "family",
    "war",
    "writing",
]
    
model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
    
inputs = tokenizer(class_list, padding=True, return_tensors="pt")
outputs = model(**inputs)
hidden_state = outputs.last_hidden_state
embeddings = outputs.pooler_output

Run Code Online (Sandbox Code Playgroud)

我的嵌入位于变量“embeddings”中。问题：

我是否可以将嵌入转换回“class_list”中的输入字符串？准确地说：如果我将嵌入发送给一个不预先知道原始字符串列表的人；他们需要采取哪些步骤来提取原始字符串列表？
如果是这样，我该怎么做？

Answer 1

CpI*_*ILL 0

如果你进行波束搜索，那么使用 LLM 的贪婪搜索可能会起作用，但在每个叶节点，你会为每个分支到目前为止生成的内容生成一个嵌入，然后将该分支的距离与你搜索的嵌入进行比较。 .现在我把它写出来了，我觉得它有点像A*算法！当您处于某个距离阈值内或开始距目标更远时您会停下来吗？我想如果它是由法学硕士指导的话，它会符合语法吗？

当我有空的时候可能会尝试一下......

归档时间：	3 年前
查看次数：	2058 次
最近记录：	1 年，8 月前