计算值和值之间的值数-通过变量x

Eri*_*rey 5 r count data.table

我有一些数据:

library(data.table)
set.seed(1)
df1 <- data.frame(let=sample(sample(letters,2),5, replace=TRUE),
                  num=sample(1:10,5))
setDT(df1)
   let num
1:   j   7
2:   j   6
3:   g   1
4:   j   2
5:   j  10
Run Code Online (Sandbox Code Playgroud)

并且我想通过来计算num小于或等于numAND大于或等于num-4的数量let。使用data.table包将是更好的选择,但是使用dplyr或base r的任何解决方案也都可以。输出如下所示:

   let num countNumByLet
1:   j   7             2
2:   j   6             2
3:   g   1             1
4:   j   2             1
5:   j  10             3
Run Code Online (Sandbox Code Playgroud)

Col*_*ole 5

这也可以使用non-equi连接来解决:

dt <- data.table(let = sample(sample(letters, n_let), n_let * n_per_grp, replace = T),
                 num = sample (20, n_let * n_per_grp, replace = T))

dt[, .(let, high = num + 4L, num)
   ][dt,
     on = .(let,
            num <= num,
            high >= num),
     .(countNumByLet = .N),
     by = .EACHI
     ][, high:= NULL][]

   let num countNumByLet
1:   j   7             2
2:   j   6             2
3:   g   1             1
4:   j   2             1
5:   j  10             3
Run Code Online (Sandbox Code Playgroud)

对于 5 的数据集,方法无关紧要。但是在扩大规模时,非对等联接确实有帮助:

n_let <- 26
n_per_grp <- 1E1

dt <- data.table(let = sample(sample(letters, n_let), n_let * n_per_grp, replace = T),
                 num = sample (20, n_let * n_per_grp, replace = T))

# 260 observations; 26 groups
# A tibble: 2 x 13
  expression     min median `itr/sec` mem_alloc
  <bch:expr>  <bch:> <bch:>     <dbl> <bch:byt>
1 dt_sapply   2.41ms 2.67ms      364.    53.9KB
2 dt_non_equi 5.08ms 5.66ms      170.   223.7KB

#2,600 observations; 26 groups
# A tibble: 2 x 13
  expression      min  median `itr/sec` mem_alloc
  <bch:expr>  <bch:t> <bch:t>     <dbl> <bch:byt>
1 dt_sapply   11.49ms 12.15ms      80.3    4.67MB
2 dt_non_equi  6.39ms  7.25ms     117.    398.8KB

#26,000 observations; 26 groups
# A tibble: 2 x 13
  expression      min median `itr/sec` mem_alloc
  <bch:expr>  <bch:t> <bch:>     <dbl> <bch:byt>
1 dt_sapply   404.1ms  404ms      2.47  403.46MB
2 dt_non_equi  24.2ms   25ms     39.8     2.09MB

#260,000 observations; 26 groups
# A tibble: 2 x 13
  expression      min  median `itr/sec` mem_alloc
  <bch:expr>  <bch:t> <bch:t>     <dbl> <bch:byt>
1 dt_sapply     38.6s   38.6s    0.0259    38.8GB
2 dt_non_equi 524.2ms 524.2ms    1.91      19.1MB

Run Code Online (Sandbox Code Playgroud)


Stu*_*olf 4

可能有更好的方法来查找向量中每个元素的 4 个元素内的 n 个元素,但以下是我现在能想到的:

df1<-
structure(list(let = structure(c(2L, 2L, 1L, 2L, 2L), .Label = c("g", 
"j"), class = "factor"), num = c(7, 6, 1, 2, 10), res = c(2L, 
2L, 1L, 2L, 3L)), row.names = c(NA, -5L), class = "data.frame")

setDT(df1)[,
.(countNumByLet=sapply(num,function(i)sum(num-i>=-4 & num <=i)),num=num),
by=let]
Run Code Online (Sandbox Code Playgroud)