从文本字符串中获取唯一的字符串数

use*_*187 7 r stringr tm dplyr

我想知道如何从文本字符串中获取唯一的字符数.假设我正在寻找重复单词中的苹果,香蕉,菠萝,葡萄的重复计数.

 A<- c('I have a lot of pineapples, apples and grapes. One day the pineapples person gave the apples person two baskets of grapes')

 df<- data.frame(A) 
Run Code Online (Sandbox Code Playgroud)

假设我想获得文本中列出的所有水果的独特计数.

  library(stringr)
  df$fruituniquecount<- str_count(df$A, "apples|pineapples|grapes|bananas")
Run Code Online (Sandbox Code Playgroud)

我尝试了这个,但我得到了所有的计数.我希望答案为'3'.请提出您的想法.

mar*_*kus 7

您可以使用str_extract_all然后计算唯一元素的长度。

输入:

A <- c('I have a lot of pineapples, apples and grapes. One day the pineapples person gave the apples person two baskets of grapes')
fruits <- "apples|pineapples|grapes|bananas"
Run Code Online (Sandbox Code Playgroud)

结果

length(unique(c(stringr::str_extract_all(A, fruits, simplify = TRUE))))
# [1] 3
Run Code Online (Sandbox Code Playgroud)