从文本字符串中获取唯一的字符串数

Question

从文本字符串中获取唯一的字符串数

我想知道如何从文本字符串中获取唯一的字符数.假设我正在寻找重复单词中的苹果,香蕉,菠萝,葡萄的重复计数.

 A<- c('I have a lot of pineapples, apples and grapes. One day the pineapples person gave the apples person two baskets of grapes')

 df<- data.frame(A)

Run Code Online (Sandbox Code Playgroud)

假设我想获得文本中列出的所有水果的独特计数.

  library(stringr)
  df$fruituniquecount<- str_count(df$A, "apples|pineapples|grapes|bananas")

Run Code Online (Sandbox Code Playgroud)

我尝试了这个,但我得到了所有的计数.我希望答案为'3'.请提出您的想法.

Answer 1

mar*_*kus 7

您可以使用str_extract_all然后计算唯一元素的长度。

输入：

A <- c('I have a lot of pineapples, apples and grapes. One day the pineapples person gave the apples person two baskets of grapes')
fruits <- "apples|pineapples|grapes|bananas"

Run Code Online (Sandbox Code Playgroud)

结果

length(unique(c(stringr::str_extract_all(A, fruits, simplify = TRUE))))
# [1] 3

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，8 月前
查看次数：	170 次
最近记录：	6 年，7 月前