Ben*_*n Z 4 combinations r vector multiplication
假设我有一个c(1, 2, 3, 4)没有重复值的向量.我需要一个向量c(1 * 2, 1 * 3, 1 * 4, 2 * 3, 2 * 4, 3 * 4),所以乘法是在这个向量值的所有可能组合中完成的.有没有办法做到这一点?提前致谢!
这很有趣.我认为combn(1:4, 2, "*")这将是最简单的解决方案,但实际上并不起作用.我们必须使用combn(1:4, 2, prod) Onyambu评论.问题是:在"N选择K"设置中,FUN必须能够将长度为K的向量作为输入."*"不是正确的.
## K = 2 case
"*"(c(1, 2)) ## this is different from: "*"(1, 2)
#Error in *c(1, 2) : invalid unary operator
prod(c(1, 2))
#[1] 2
Run Code Online (Sandbox Code Playgroud)
感谢Maurits Evers的详细阐述outer/ lower.tri/ upper.tri.下面是一个适于方式,以避免从产生那些临时矩阵outer和*****.tri:
tri_ind <- function (n, lower= TRUE, diag = FALSE) {
if (diag) {
tmp <- n:1
j <- rep.int(1:n, tmp)
i <- sequence(tmp) - 1L + j
} else {
tmp <- (n-1):1
j <- rep.int(1:(n-1), tmp)
i <- sequence(tmp) + j
}
if (lower) list(i = i, j = j)
else list(i = j, j = i)
}
vec <- 1:4
ind <- tri_ind(length(vec), FALSE, FALSE)
#$i
#[1] 1 1 1 2 2 3
#
#$j
#[1] 2 3 4 3 4 4
vec[ind[[1]]] * vec[ind[[2]]]
#[1] 2 3 4 6 8 12
Run Code Online (Sandbox Code Playgroud)
该tri_ind函数是我的答案的包装.它可以用作快速和节省内存的替代品combn(length(vec), 2)或其outer等效性.
最初我链接了一个finv函数,但它不适合基准测试,因为它设计用于从"dist"对象(折叠的下三角矩阵)中提取一些元素.如果引用三角矩阵的所有元素,则其索引计算实际上会产生不必要的开销.tri_ind是一个更好的选择.
library(bench)
Run Code Online (Sandbox Code Playgroud)
基准指数生成
bench1 <- function (n) {
bench::mark("combn" = combn(n, 2),
"tri_ind" = tri_ind(n, TRUE, FALSE),
"upper.tri" = which(upper.tri(matrix(0, n, n)), arr.ind = TRUE),
check = FALSE)
}
## for small problem, `tri_ind` is already the fastest
bench1(100)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:t> <bch:tm> <dbl> <bch:byt> <dbl> <int>
#1 combn 11.6ms 11.9ms 11.9ms 12.59ms 83.7 39.1KB 9 32
#2 tri_ind 189.3µs 205.9µs 194.6µs 4.82ms 4856. 60.4KB 21 1888
#3 upper.tri 618.4µs 635.8µs 624.1µs 968.36µs 1573. 411.7KB 57 584
## `tri_ind` is 10x faster than `upper.tri`, and 100x faster than `combn`
bench1(5000)
# expression min mean median max `itr/sec` mem_alloc n_gc
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#1 combn 30.6s 30.6s 30.6s 30.6s 0.0327 95.4MB 242
#2 tri_ind 231.98ms 259.31ms 259.31ms 286.63ms 3.86 143.3MB 0
#3 upper.tri 3.02s 3.02s 3.02s 3.02s 0.332 953.6MB 4
Run Code Online (Sandbox Code Playgroud)
OP问题的基准
bench2 <- function (n) {
vec <- numeric(n)
bench::mark("combn" = combn(vec, 2, prod),
"tri_ind" = {ind <- tri_ind(n, FALSE, FALSE);
vec[ind[[1]]] * vec[ind[[2]]]},
"upper.tri" = {m <- outer(vec, vec);
c(m[upper.tri(m)])},
check = FALSE)
}
bench2(100)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:t> <bch:tm> <dbl> <bch:byt> <dbl> <int>
#1 combn 18.6ms 19.2ms 19.1ms 20.55ms 52.2 38.7KB 4 22
#2 tri_ind 386.9µs 432.3µs 395.6µs 7.58ms 2313. 176.6KB 1 1135
#3 upper.tri 326.9µs 488.5µs 517.6µs 699.07µs 2047. 336KB 0 1024
bench2(5000)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int>
#1 combn 48.13s 48.13s 48.13s 48.13s 0.0208 95.3MB 204 1
#2 tri_ind 861.7ms 861.7ms 861.7ms 861.7ms 1.16 429.3MB 0 1
#3 upper.tri 1.95s 1.95s 1.95s 1.95s 0.514 810.6MB 3 1
Run Code Online (Sandbox Code Playgroud)
对我来说,有趣的是要知道它combn不是用编译的代码编写的.它内部实际上有一个R级别的循环.各种替代方案只是试图在"N选2"的情况下加速它而不编写编译代码.
更好的选择?
功能combinations从gtools包使用递归算法,这是很大的问题大小问题.功能combn从combinat包不使用编译后的代码,因此不优于combn来自R核心.该RcppAlgos包装由约瑟夫·伍德有一个comboGenearl函数,它是最快的一个我看的更远.
library(RcppAlgos)
## index generation
bench3 <- function (n) {
bench::mark("tri_ind" = tri_ind(n, FALSE, FALSE),
"Joseph" = comboGeneral(n, 2), check = FALSE)
}
bench3(5000)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int>
#1 tri_ind 290ms 297ms 297ms 303ms 3.37 143.4MB 4 2
#2 Joseph 134ms 155ms 136ms 212ms 6.46 95.4MB 2 4
## on OP's problem
bench4 <- function (n) {
vec <- numeric(n)
bench::mark("tri_ind" = {ind <- tri_ind(n, FALSE, FALSE);
vec[ind[[1]]] * vec[ind[[2]]]},
"Joseph" = comboGeneral(vec, 2, constraintFun = "prod", keepResults = TRUE),
check = FALSE)
}
bench4(5000)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int>
#1 tri_ind 956ms 956ms 956ms 956ms 1.05 429MB 3 1
#2 Joseph 361ms 362ms 362ms 363ms 2.76 286MB 1 2
Run Code Online (Sandbox Code Playgroud)
Joseph Wood对组合/排列有各种答案.例如:更快的组合版本.
| 归档时间: |
|
| 查看次数: |
162 次 |
| 最近记录: |