我正在尝试使用CLIP来计算字符串之间的相似度。(我知道 CLIP 通常用于文本和图像,但它也应该只适用于字符串。)
我提供了简单文本提示的列表,并计算它们嵌入之间的相似度。相似之处消失了,但我不知道我做错了什么。
import torch
import clip
from torch.nn import CosineSimilarity
cos = CosineSimilarity(dim=1, eps=1e-6)
def gen_features(model, text):
tokens = clip.tokenize([text]).to(device)
text_features = model.encode_text(tokens)
return text_features
def dist(v1, v2):
#return torch.dist(normalize(v1), normalize(v2)) # euclidean distance
#return cos(normalize(v1), normalize(v2)).item() # cosine similarity
similarity = (normalize(v1) @ normalize(v2).T)
return similarity.item()
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "ViT-B/32"
model, _ = clip.load(model_name, device=device)
sentences = ["A cat", "A dog", "A labrador", "A poodle", "A wolf", "A lion", …Run Code Online (Sandbox Code Playgroud)