使用常用词按行合并两个数据帧

foc*_*foc 3 r quanteda

df1 <- data.frame(freetext = c("open until monday night", "one more time to insert your coin"), numid = c(291,312))
df2 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5))
Run Code Online (Sandbox Code Playgroud)

我会使用自由文本列作为选项来合并两个数据框。然而,文本与删除或显示的某些单词并不完全相同。

是否有任何选项可以找到行之间相同单词的最大数量并根据此将它们合并?

这是预期输出的示例

df3 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5), numid = c(291,312))
Run Code Online (Sandbox Code Playgroud)

Ron*_*hah 6

也许,您可以stringdist从适合您的数据的参数中查看连接fuzzyjoin并使用max_dist它。

fuzzyjoin::stringdist_inner_join(df1, df2, by = 'freetext', max_dist = 10)

#  freetext.x                        numid freetext.y                   aid
#  <chr>                             <dbl> <chr>                      <dbl>
#1 open until monday night             291 open until night               3
#2 one more time to insert your coin   312 one time to insert your be     5
Run Code Online (Sandbox Code Playgroud)