按重复次数对数据帧进行子集

Mir*_*zig 6 r

如果我有这样的数据帧:

neu <- data.frame(test1 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14), 
                  test2 = c("a","b","a","b","c","c","a","c","c","d","d","f","f","f"))
neu
   test1 test2
1      1     a
2      2     b
3      3     a
4      4     b
5      5     c
6      6     c
7      7     a
8      8     c
9      9     c
10    10     d
11    11     d
12    12     f
13    13     f
14    14     f
Run Code Online (Sandbox Code Playgroud)

而且我想只选择那些因子水平test2出现超过三次的值,那么最快的方法是什么?

非常感谢,在之前的问题中没有找到正确的答案.

Tho*_*mas 7

使用以下方法查找行:

z <- table(neu$test2)[table(neu$test2) >= 3] # repeats greater than or equal to 3 times
Run Code Online (Sandbox Code Playgroud)

要么:

z <- names(which(table(neu$test2)>=3))
Run Code Online (Sandbox Code Playgroud)

然后子集:

subset(neu, test2 %in% names(z))
Run Code Online (Sandbox Code Playgroud)

要么:

neu[neu$test2 %in% names(z),]
Run Code Online (Sandbox Code Playgroud)


Mat*_*rde 5

这是另一种方式:

 with(neu, neu[ave(seq(test2), test2, FUN=length) > 3, ])

#   test1 test2
# 5     5     c
# 6     6     c
# 8     8     c
# 9     9     c
Run Code Online (Sandbox Code Playgroud)