Eri*_*rey 5 r count data.table
我有一些数据:
library(data.table)
set.seed(1)
df1 <- data.frame(let=sample(sample(letters,2),5, replace=TRUE),
num=sample(1:10,5))
setDT(df1)
let num
1: j 7
2: j 6
3: g 1
4: j 2
5: j 10
Run Code Online (Sandbox Code Playgroud)
并且我想通过来计算num小于或等于numAND大于或等于num-4的数量let。使用data.table包将是更好的选择,但是使用dplyr或base r的任何解决方案也都可以。输出如下所示:
let num countNumByLet
1: j 7 2
2: j 6 2
3: g 1 1
4: j 2 1
5: j 10 3
Run Code Online (Sandbox Code Playgroud)
这也可以使用non-equi连接来解决:
dt <- data.table(let = sample(sample(letters, n_let), n_let * n_per_grp, replace = T),
num = sample (20, n_let * n_per_grp, replace = T))
dt[, .(let, high = num + 4L, num)
][dt,
on = .(let,
num <= num,
high >= num),
.(countNumByLet = .N),
by = .EACHI
][, high:= NULL][]
let num countNumByLet
1: j 7 2
2: j 6 2
3: g 1 1
4: j 2 1
5: j 10 3
Run Code Online (Sandbox Code Playgroud)
对于 5 的数据集,方法无关紧要。但是在扩大规模时,非对等联接确实有帮助:
n_let <- 26
n_per_grp <- 1E1
dt <- data.table(let = sample(sample(letters, n_let), n_let * n_per_grp, replace = T),
num = sample (20, n_let * n_per_grp, replace = T))
# 260 observations; 26 groups
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:> <bch:> <dbl> <bch:byt>
1 dt_sapply 2.41ms 2.67ms 364. 53.9KB
2 dt_non_equi 5.08ms 5.66ms 170. 223.7KB
#2,600 observations; 26 groups
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:t> <bch:t> <dbl> <bch:byt>
1 dt_sapply 11.49ms 12.15ms 80.3 4.67MB
2 dt_non_equi 6.39ms 7.25ms 117. 398.8KB
#26,000 observations; 26 groups
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:t> <bch:> <dbl> <bch:byt>
1 dt_sapply 404.1ms 404ms 2.47 403.46MB
2 dt_non_equi 24.2ms 25ms 39.8 2.09MB
#260,000 observations; 26 groups
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:t> <bch:t> <dbl> <bch:byt>
1 dt_sapply 38.6s 38.6s 0.0259 38.8GB
2 dt_non_equi 524.2ms 524.2ms 1.91 19.1MB
Run Code Online (Sandbox Code Playgroud)
可能有更好的方法来查找向量中每个元素的 4 个元素内的 n 个元素,但以下是我现在能想到的:
df1<-
structure(list(let = structure(c(2L, 2L, 1L, 2L, 2L), .Label = c("g",
"j"), class = "factor"), num = c(7, 6, 1, 2, 10), res = c(2L,
2L, 1L, 2L, 3L)), row.names = c(NA, -5L), class = "data.frame")
setDT(df1)[,
.(countNumByLet=sapply(num,function(i)sum(num-i>=-4 & num <=i)),num=num),
by=let]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
50 次 |
| 最近记录: |