如何在向量中的每个字符串中仅保留唯一的单词

she*_*ode 5 string r vector duplicates

我的数据看起来像这样:

vector = c("hello I like to code hello","Coding is fun", "fun fun fun")
Run Code Online (Sandbox Code Playgroud)

我想删除重复的单词(空格分隔),即输出应该是这样的

vector_cleaned

[1] "hello I like to code"
[2] "coding is fun"
[3] "fun"
Run Code Online (Sandbox Code Playgroud)

A5C*_*2T1 11

将其拆分(strsplit在空格上),使用unique(in lapply),然后将paste它们重新组合在一起:

vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ")
# [1] "hello i like to code" "coding is fun"        "fun"  

## OR
vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L))
Run Code Online (Sandbox Code Playgroud)

根据评论更新

您始终可以编写自定义函数以与函数一起使用vapply.例如,这是一个函数,它接受一个拆分字符串,删除比一定数量的字符短的字符串,并将"唯一"设置作为用户选择.

myFun <- function(x, minLen = 3, onlyUnique = TRUE) {
  a <- if (isTRUE(onlyUnique)) unique(x) else x
  paste(a[nchar(a) > minLen], collapse = " ")
}
Run Code Online (Sandbox Code Playgroud)

比较以下输出,看看它是如何工作的.

vapply(strsplit(vector, " "), myFun, character(1L))
vapply(strsplit(vector, " "), myFun, character(1L), onlyUnique = FALSE)
vapply(strsplit(vector, " "), myFun, character(1L), minLen = 0)
Run Code Online (Sandbox Code Playgroud)