我们经常看到"相关项目".例如在博客中我们有相关的帖子,在书中我们有相关的书籍等等.我的问题是我们如何编译这些相关性?如果它只是标记,我经常会看到没有相同标记的相关项目.例如,当搜索"粉红色"时,相关项可能具有"紫色"标记.
任何人有任何想法?
小智 31
有很多方法可以计算两个项目的相似性,但是对于一个简单的方法,请看一下Jaccard系数.
http://en.wikipedia.org/wiki/Jaccard_index
这是:J(a,b)=十字路口(a,b)/联合(a,b)
So lets say you want to compute the coefficient of two items:
Item A, which has the tags "books, school, pencil, textbook, reading"
Item B, which has the tags "books, reading, autobiography"
intersection(A,B) = books, reading
union(A,B) = books, school, pencil, textbook, reading, autobiography
so J(a,b) = 2/6 = .333
So the most related item to A would be the item which results in the highest Jaccard Coefficient when paired with A.
Run Code Online (Sandbox Code Playgroud)