删除 R 中另一列中存在的一列中的单词

Sri*_*yer 5 regex r dataframe

我有一个这种格式的数据框:

A <- c("John Smith", "Red Shirt", "Family values are better")
B <- c("John is a very highly smart guy", "We tried the tea but didn't enjoy it at all", "Family is very important as it gives you values")

df <- as.data.frame(A, B)
Run Code Online (Sandbox Code Playgroud)

我的目的是将结果恢复为:

ID   A                           B
1    John Smith                  is a very highly smart guy
2    Red Shirt                   We tried the tea but didn't enjoy it at all
3    Family values are better    is very important as it gives you
Run Code Online (Sandbox Code Playgroud)

我试过了:

test<-df %>% filter(sapply(1:nrow(.), function(i) grepl(A[i], B[i])))
Run Code Online (Sandbox Code Playgroud)

但它没有给我我想要的。

任何建议/帮助?

MKR*_*MKR 6

mapply一种解决方案是与 一起使用strsplit

诀窍是拆分df$A成单独的单词并折叠由 分隔的单词|,然后使用它 as patterningsub来替换 with ""

lst <- strsplit(df$A, split = " ")

df$B <- mapply(function(x,y){gsub(paste0(x,collapse = "|"), "",df$B[y])},lst,1:length(lst))
df
# A                                           B
# 1               John Smith                  is a very highly smart guy
# 2                Red Shirt We tried the tea but didn't enjoy it at all
# 3 Family values are better          is very important as it gives you 
Run Code Online (Sandbox Code Playgroud)

另一种选择是:

df$B <- mapply(function(x,y)gsub(x,"",y) ,gsub(" ", "|",df$A),df$B)
Run Code Online (Sandbox Code Playgroud)

数据:

A <- c("John Smith", "Red Shirt", "Family values are better")
B <- c("John is a very highly smart guy", "We tried the tea but didn't enjoy it at all", "Family is very important as it gives you values")

df <- data.frame(A, B, stringsAsFactors = FALSE)
Run Code Online (Sandbox Code Playgroud)