openai 嵌入相同的文本但返回不同的向量

Question

openai 嵌入相同的文本但返回不同的向量

我现在正在尝试 OpenAI Embedding API。但我发现了一个问题。当我一次又一次嵌入相同的文本时，我得到了不同的向量数组。

文本内容为baby is crying，模型为text-embedding-ada-002(MODEL GENERATION: V2)。我循环运行代码for5 次，得到了不同的向量值。例如，第一个向量值为

"-0.017496677", "-0.017429505", "-0.017429505", "-0.017429505" and "-0.017496677"

Run Code Online (Sandbox Code Playgroud)

我认为对于相同的文本内容，嵌入后应该返回相同的向量。这样对吗？

Answer 1

Hri*_*rma 2

相同输入（句子）的向量应该彼此相同（非常相似）。
\n
如果不是，那么在从矢量数据库中搜索类似上下文时，结果将不准确（正确）。
\n
我发现这非常有帮助，请阅读：Openai 讨论
\n
引用论坛的讨论：
\n

\n

Embeddings only return vectors. The vector is the same for the same input, same model, and the same API endpoint. But we have seen differences between the OpenAI endpoint and the Azure endpoint for the same model. So a pick an endpoint and stick with it to avoid any differences.\n\nThere could be very slight roundoff errors in the embedding when calling it over and over for the same (above) configuration, but this is in the noise and won\xe2\x80\x99t effect your search result\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	2 年，7 月前
查看次数：	2671 次
最近记录：	2 年，4 月前