Lea*_*r27 4 r unique summarization distinct-values
假设我有一个数据集data:
x1 <- c("a","a","a","a","a","a","b","b","b","b")
x2 <- c("a1","a1","a1","a1","a1","a1","b1","b1","b2","b2")
data <- data.frame(x1,x2)
x1 x2
a a1
a a1
a a2
a a1
a a2
a a3
b b1
b b1
b b2
b b2
Run Code Online (Sandbox Code Playgroud)
我想找到x1对应的唯一值的数量x2
例如a,只有3个唯一值(a1,a2和a3)并且b有2个值(b1和b2)
我使用aggregate(x1~.,data,sum)但它没有用,因为这些是因素,而不是整数.
请帮忙
尝试
aggregate(x2~x1, data, FUN=function(x) length(unique(x)))
# x1 x2
#1 a 3
#2 b 2
Run Code Online (Sandbox Code Playgroud)
要么
rowSums(table(unique(data)))
Run Code Online (Sandbox Code Playgroud)
要么
library(dplyr)
data %>%
group_by(x1) %>%
summarise(n=n_distinct(x2))
Run Code Online (Sandbox Code Playgroud)
或使用dplyr@Eric建议的其他选项
count(distinct(data), x1)
Run Code Online (Sandbox Code Playgroud)
要么
library(data.table)
setDT(data)[, uniqueN(x2) , x1]
Run Code Online (Sandbox Code Playgroud)
如果您需要unique'x2' 的值和计数
setDT(data)[, list(n=uniqueN(x2), x2=unique(x2)) , x1]
Run Code Online (Sandbox Code Playgroud)
或者只是unique价值观
setDT(data)[, list(x2=unique(x2)) , x1]
Run Code Online (Sandbox Code Playgroud)
或使用 dplyr
unique(data, by=x1) %>%
group_by(x1) %>%
mutate(n=n_distinct(x2))
Run Code Online (Sandbox Code Playgroud)
仅适用于唯一值
unique(data, by=x1)
Run Code Online (Sandbox Code Playgroud)