我有两个因子变量 - 在data.frame中称为"数据" - 看起来像这样:
brand Country
"A" "ITA"
"A" "ITA"
"C" "SPA"
"B" "POR"
"C" "SPA"
"B" "POR"
"A" "ITA"
"D" "ITA"
"E" "SPA"
"D" "ITA"
Run Code Online (Sandbox Code Playgroud)
我想获得一个表格,列出唯一的编号brands
通过country
.以下示例应该是:
# of unique brands Country
2 "ITA"
2 "SPA"
1 "POR"
Run Code Online (Sandbox Code Playgroud)
首先,我试过:
data$var <- with(data, ave(brand, Country, FUN = function(x){length(unique(x))}))
Run Code Online (Sandbox Code Playgroud)
但它不适用于因素,所以我转换了我的因素:
data$brand_t<-as.character(data$brand)
data$Country_t<-as.character(data$Country)
Run Code Online (Sandbox Code Playgroud)
然后再说:
data$var <- with(data, ave(brand_t, Country_t, FUN = function(x){length(unique(x))}))
Run Code Online (Sandbox Code Playgroud)
现在,如果我申请unique(data$var)
我得到"2", "2", "1"
哪个是正确的,但我无法得到我想要的表.可能很傻,但我无法解决.
我也想知道是否有一种更聪明的方法可以使用因子代替它.
再次感谢.
这里有两种快速方法,使用data.table
v> = 1.9.5或dplyr
library(data.table)
setDT(df)[, uniqueN(brand), by = Country]
Run Code Online (Sandbox Code Playgroud)
要么
library(dplyr)
df %>%
group_by(Country) %>%
summarise(n = n_distinct(brand))
Run Code Online (Sandbox Code Playgroud)
或者用基础R
aggregate(brand ~ Country, df, function(x) length(unique(x)))
Run Code Online (Sandbox Code Playgroud)
要么
tapply(df$brand, df$Country, function(x) length(unique(x)))
Run Code Online (Sandbox Code Playgroud)
或者如果您喜欢基本R简单语法并且您的数据集不是太大,您可以将方法组合在一起
aggregate(brand ~ Country, df, uniqueN)
Run Code Online (Sandbox Code Playgroud)
要么
aggregate(brand ~ Country, df, n_distinct)
Run Code Online (Sandbox Code Playgroud)