有没有更好的方法来实现这一目标?我想从这个向量中删除所有字符串,这些字符串是其他元素的子字符串.
words = c("please can you",
"please can",
"can you",
"how did you",
"did you",
"have you")
> words
[1] "please can you" "please can" "can you" "how did you" "did you" "have you"
library(data.table)
library(stringr)
dt = setDT(expand.grid(word1 = words, word2 = words, stringsAsFactors = FALSE))
dt[, found := str_detect(word1, word2)]
setdiff(words, dt[found == TRUE & word1 != word2, word2])
[1] "please can you" "how did you" "have you"
Run Code Online (Sandbox Code Playgroud)
这有效,但看起来有点矫枉过正,我有兴趣知道一种更优雅的方式.
搜索每个组件words以words保留一次发生的组件:
words[colSums(sapply(words, grepl, words, fixed = TRUE)) == 1]
Run Code Online (Sandbox Code Playgroud)
赠送:
[1] "please can you" "how did you" "have you"
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
85 次 |
| 最近记录: |