如何计算两个字符串向量之间的余弦相似度

Ozg*_*kın 3 r machine-learning cosine-similarity

我有 2 个维度为 6 的向量,我想要一个介于 0 和 1 之间的数字。

a=c("HDa","2Pb","2","BxU","BuQ","Bve")

b=c("HCK","2Pb","2","09","F","G")
Run Code Online (Sandbox Code Playgroud)

谁能解释一下我应该怎么做?

use*_*782 5

使用该lsa包和该包的手册

# create some files
library('lsa')
td = tempfile()
dir.create(td)
write( c("HDa","2Pb","2","BxU","BuQ","Bve"), file=paste(td, "D1", sep="/"))
write( c("HCK","2Pb","2","09","F","G"), file=paste(td, "D2", sep="/"))

# read files into a document-term matrix
myMatrix = textmatrix(td, minWordLength=1)
Run Code Online (Sandbox Code Playgroud)

编辑:显示mymatrix对象如何

myMatrix
#myMatrix
#       docs
#  terms D1 D2
#    2    1  1
#    2pb  1  1
#    buq  1  0
#    bve  1  0
#    bxu  1  0
#    hda  1  0
#    09   0  1
#    f    0  1
#    g    0  1
#    hck  0  1

# Calculate cosine similarity
res <- lsa::cosine(myMatrix[,1], myMatrix[,2])
res
#0.3333
Run Code Online (Sandbox Code Playgroud)