如何汇总R中分类变量的唯一值的计数

Lea*_*r27 4 r unique summarization distinct-values

假设我有一个数据集data:

x1 <- c("a","a","a","a","a","a","b","b","b","b")
x2 <- c("a1","a1","a1","a1","a1","a1","b1","b1","b2","b2")
data <- data.frame(x1,x2)

x1 x2
a  a1
a  a1 
a  a2
a  a1
a  a2
a  a3
b  b1
b  b1
b  b2 
b  b2
Run Code Online (Sandbox Code Playgroud)

我想找到x1对应的唯一值的数量x2

例如a,只有3个唯一值(a1,a2a3)并且b有2个值(b1b2)

我使用aggregate(x1~.,data,sum)但它没有用,因为这些是因素,而不是整数.

请帮忙

akr*_*run 8

尝试

 aggregate(x2~x1, data, FUN=function(x) length(unique(x)))
 #  x1 x2
 #1  a  3
 #2  b  2
Run Code Online (Sandbox Code Playgroud)

要么

 rowSums(table(unique(data)))
Run Code Online (Sandbox Code Playgroud)

要么

library(dplyr)
data %>% 
     group_by(x1) %>%
     summarise(n=n_distinct(x2))
Run Code Online (Sandbox Code Playgroud)

或使用dplyr@Eric建议的其他选项

count(distinct(data), x1)
Run Code Online (Sandbox Code Playgroud)

要么

library(data.table)
setDT(data)[, uniqueN(x2) , x1]
Run Code Online (Sandbox Code Playgroud)

更新

如果您需要unique'x2' 的值和计数

setDT(data)[, list(n=uniqueN(x2), x2=unique(x2)) , x1]
Run Code Online (Sandbox Code Playgroud)

或者只是unique价值观

setDT(data)[, list(x2=unique(x2)) , x1]
Run Code Online (Sandbox Code Playgroud)

或使用 dplyr

 unique(data, by=x1) %>% 
                   group_by(x1) %>%
                   mutate(n=n_distinct(x2))
Run Code Online (Sandbox Code Playgroud)

仅适用于唯一值

unique(data, by=x1)
Run Code Online (Sandbox Code Playgroud)