DK2*_*DK2 7 r subset difference
我有分组变量("from")和值("number")的数据:
from number
1 1
1 1
2 1
2 2
3 2
3 2
Run Code Online (Sandbox Code Playgroud)
我想要对数据进行子集化,并选择具有两个或更多唯一值的组.在我的数据中,只有第2组有多个不同的"数字",所以这是期望的结果:
from number
2 1
2 2
Run Code Online (Sandbox Code Playgroud)
几种可能性,这是我的最爱
library(data.table)
setDT(df)[, if(+var(number)) .SD, by = from]
# from number
# 1: 2 1
# 2: 2 2
Run Code Online (Sandbox Code Playgroud)
基本上,我们每组检查是否有任何差异,if TRUE,然后返回组值
有了基地R,我会选择
df[as.logical(with(df, ave(number, from, FUN = var))), ]
# from number
# 3 2 1
# 4 2 2
Run Code Online (Sandbox Code Playgroud)
编辑:对于非数值数据,你可以尝试新uniqueN的功能的开发人员版本的data.table(或使用length(unique(number)) > 1替代
setDT(df)[, if(uniqueN(number) > 1) .SD, by = from]
Run Code Online (Sandbox Code Playgroud)
你可以试试
library(dplyr)
df1 %>%
group_by(from) %>%
filter(n_distinct(number)>1)
# from number
#1 2 1
#2 2 2
Run Code Online (Sandbox Code Playgroud)
或使用 base R
indx <- rowSums(!!table(df1))>1
subset(df1, from %in% names(indx)[indx])
# from number
#3 2 1
#4 2 2
Run Code Online (Sandbox Code Playgroud)
要么
df1[with(df1, !ave(number, from, FUN=anyDuplicated)),]
# from number
#3 2 1
#4 2 2
Run Code Online (Sandbox Code Playgroud)