Python Spacy初学者:相似功能

Question

Python Spacy初学者:相似功能

aie*_*edu 3 python nlp spacy

在Python的spaCy的教程示例中,结果apples.similarity(oranges)是 0.39289959293092641 而不是 0.7857989796519943

有什么理由吗？教程的原始文档 https://spacy.io/docs/ 与我得到的教程有不同答案的教程:http: //textminingonline.com/getting-started-with-spacy

谢谢

Answer 1

Eth*_*han 9

这似乎是spacy中的一个错误.

不知何故vector_norm,计算错误.

import spacy
import numpy as np
nlp = spacy.load("en")
# using u"apples" just as an example
apples = nlp.vocab[u"apples"]
print apples.vector_norm
# prints 1.4142135381698608, or sqrt(2)
print np.sqrt(np.dot(apples.vector, apples.vector))
# prints 1.0

Run Code Online (Sandbox Code Playgroud)

然后vector_norm使用in similarity,它总是返回一个始终是正确值的一半的值.

def similarity(self, other):
    if self.vector_norm == 0 or other.vector_norm == 0:
        return 0.0
    return numpy.dot(self.vector, other.vector) / (self.vector_norm * other.vector_norm)

Run Code Online (Sandbox Code Playgroud)

如果您对同义词的相似性分数进行排名,则可能没问题.但是如果你需要正确的余弦相似度得分,那么结果是不正确的.

我在这里提交了这个问题.希望它能很快得到修复.

归档时间：	9 年，2 月前
查看次数：	2198 次
最近记录：	6 年，3 月前