R在字符串中提取重复的单词

Question

R在字符串中提取重复的单词

我有一个字符串a,并b认为我的组合data.我的目的是获得一个包含重复单词的新变量.

    a = c("the red house av", "the blue sky", "the green grass")
    b = c("the house built", " the sky of the city", "the grass in the garden")

data = data.frame(a, b)

Run Code Online (Sandbox Code Playgroud)

基于这个答案,我可以得到那些重复的逻辑duplicated()

data = data%>% mutate(c = paste(a,b, sep = " "),
                     d = vapply(lapply(strsplit(c, " "), duplicated), paste, character(1L), collapse = " "))

Run Code Online (Sandbox Code Playgroud)

然而,我无法获得这些词语.我想要的数据应该是这样的

> data.1
                 a                       b         d
1 the red house av         the house built the house
2     the blue sky     the sky of the city   the sky
3  the green grass the grass in the garden the grass

Run Code Online (Sandbox Code Playgroud)

任何有关上述功能的帮助都将受到高度赞赏.

Answer 1

Chr*_*sss 5

a = c("the red house av", "the blue sky", "the green grass")
b = c("the house built", " the sky of the city", "the grass in the garden")

data <-  data.frame(a, b, stringsAsFactors = FALSE)

func <- function(dta) {
    words <- intersect( unlist(strsplit(dta$a, " ")), unlist(strsplit(dta$b, " ")) )
    dta$c <- paste(words, collapse = " ")
    return( as.data.frame(dta, stringsAsFactors = FALSE) )
}

library(dplyr)
data %>% rowwise() %>% do( func(.) )

Run Code Online (Sandbox Code Playgroud)

结果:

#Source: local data frame [3 x 3]
#Groups: <by row>
#
## A tibble: 3 x 3
#                 a                       b         c
#*            <chr>                   <chr>     <chr>
#1 the red house av         the house built the house
#2     the blue sky     the sky of the city   the sky
#3  the green grass the grass in the garden the grass

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，4 月前
查看次数：	612 次
最近记录：	9 年，4 月前