Ran*_*Lai 21 combinations r permutation multiset r-faq
在这个主题中,我试图在这里包括所有常见问题及其答案.我希望这对某人有用.
一般问题:如何r从n对象生成对象序列?
总共存在这类2^3=8问题.
[更新]
Josh O'Brien认为这8个问题与十二种方式有关.实际上,"不同"的问题以十二种方式包含在内,而"非独特"的问题则不包括在内.无论如何,将这里的8个问题与12倍的方式进行比较是很有趣的.请参阅注释以获取进一步的读数.
Ran*_*Lai 23
编辑:我已经更新了答案,使用更高效的包arrangements
arrangement安排包含一些有效的发生器和迭代器用于排列和组合.已经证明,它arrangements优于大多数类似的现有包装.可以在这里找到一些基准.
以下是上述问题的答案
# 1) combinations: without replacement: distinct items
combinations(5, 2)
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 1 5
[5,] 2 3
[6,] 2 4
[7,] 2 5
[8,] 3 4
[9,] 3 5
[10,] 4 5
# 2) combinations: with replacement: distinct items
combinations(5, 2, replace=TRUE)
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 1 4
[5,] 1 5
[6,] 2 2
[7,] 2 3
[8,] 2 4
[9,] 2 5
[10,] 3 3
[11,] 3 4
[12,] 3 5
[13,] 4 4
[14,] 4 5
[15,] 5 5
# 3) combinations: without replacement: non distinct items
combinations(x = c("a", "b", "c"), freq = c(2, 1, 1), k = 2)
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "c"
# 4) combinations: with replacement: non distinct items
combinations(x = c("a", "b", "c"), k = 2, replace = TRUE) # as `freq` does not matter
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "b"
[5,] "b" "c"
[6,] "c" "c"
# 5) permutations: without replacement: distinct items
permutations(5, 2)
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 1 5
[5,] 2 1
[6,] 2 3
[7,] 2 4
[8,] 2 5
[9,] 3 1
[10,] 3 2
[11,] 3 4
[12,] 3 5
[13,] 4 1
[14,] 4 2
[15,] 4 3
[16,] 4 5
[17,] 5 1
[18,] 5 2
[19,] 5 3
[20,] 5 4
# 6) permutations: with replacement: distinct items
permutations(5, 2, replace = TRUE)
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 1 4
[5,] 1 5
[6,] 2 1
[7,] 2 2
[8,] 2 3
[9,] 2 4
[10,] 2 5
[11,] 3 1
[12,] 3 2
[13,] 3 3
[14,] 3 4
[15,] 3 5
[16,] 4 1
[17,] 4 2
[18,] 4 3
[19,] 4 4
[20,] 4 5
[21,] 5 1
[22,] 5 2
[23,] 5 3
[24,] 5 4
[25,] 5 5
# 7) permutations: without replacement: non distinct items
permutations(x = c("a", "b", "c"), freq = c(2, 1, 1), k = 2)
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "a"
[5,] "b" "c"
[6,] "c" "a"
[7,] "c" "b"
# 8) permutations: with replacement: non distinct items
permutations(x = c("a", "b", "c"), k = 2, replace = TRUE) # as `freq` doesn't matter
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "a"
[5,] "b" "b"
[6,] "b" "c"
[7,] "c" "a"
[8,] "c" "b"
[9,] "c" "c"
Run Code Online (Sandbox Code Playgroud)
使用arrangements现有包的优点很少.
整体框架:您不必为不同的方法使用不同的包.
它非常有效.有关基准测试,请参阅https://randy3k.github.io/arrangements/articles/benchmark.html.
它具有内存效率,可以生成全部13个!由于矩阵大小的限制,现有的包将无法进行1到13的排列.getnext()迭代器的方法允许用户逐个进行安排.
生成的排列是字典顺序,对于一些用户可能是期望的.
Jos*_*ood 19
下面,我们将研究具有生成组合和排列功能的软件包.如果我遗漏了任何套餐,请原谅我,请发表评论或更好,编辑这篇文章.
分析大纲:
在我们开始之前,我们注意到,组合/排列与置换的不同选择与非distint项米,在一个时间是等价的.就是这样,因为当我们有替换时,它并不具体.因此,无论特定元素最初出现多少次,输出将具有重复1到m次的该元素的实例.
gtools v 3.8.1combinat v 0.0-8multicool v 0.1-10partitions v 1.9-19RcppAlgos v 2.0.1(我是作者)arrangements v 1.1.0gRbase v 1.8-3我没有包括permute,permutations或者gRbase::aperm/ar_perm因为它们并不是真正意图攻击这些类型的问题.
| --------------------------------------- 概述 --------- ------------------------------- |
|_______________| gtools | combinat | multicool | partitions |
| comb rep | Yes | | | |
| comb NO rep | Yes | Yes | | |
| perm rep | Yes | | | |
| perm NO rep | Yes | Yes | Yes | Yes |
| perm multiset | | | Yes | |
| comb multiset | | | | |
|accepts factors| | Yes | | |
| m at a time | Yes | Yes/No | | |
|general vector | Yes | Yes | Yes | |
| iterable | | | Yes | |
|parallelizable | | | | |
| big integer | | | | |
|_______________| iterpc | arrangements | RcppAlgos | gRbase |
| comb rep | Yes | Yes | Yes | |
| comb NO rep | Yes | Yes | Yes | Yes |
| perm rep | Yes | Yes | Yes | |
| perm NO rep | Yes | Yes | Yes | * |
| perm multiset | Yes | Yes | Yes | |
| comb multiset | Yes | Yes | Yes | |
|accepts factors| | Yes | Yes | |
| m at a time | Yes | Yes | Yes | Yes |
|general vector | Yes | Yes | Yes | Yes |
| iterable | | Yes | Partially | |
|parallelizable | | Yes | Yes | |
| big integer | | Yes | | |
Run Code Online (Sandbox Code Playgroud)
的任务,m at a time并且general vector,请参阅"产生结果的能力米在时间"(当米小于矢量的长度),并重新排列"一般矢量"相对于1:n.在实践中,我们通常关注的是找到一般向量的重新排列,因此下面的所有检查都将反映这一点(如果可能的话).
所有基准都是在3种不同的设置上运行的.
列出的结果是从设置#1(即MBPro)获得的.其他两个系统的结果相似.此外,gc()会定期调用以确保所有内存都可用(请参阅参考资料?gc).
首先,我们检查组合,而不是一次选择m.
RcppAlgoscombinat(或utils)gtoolsarrangementsgRbase如何:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(13)
testVector1 <- sort(sample(100, 17))
m <- 9
t1 <- comboGeneral(testVector1, m) ## returns matrix with m columns
t3 <- combinat::combn(testVector1, m) ## returns matrix with m rows
t4 <- gtools::combinations(17, m, testVector1) ## returns matrix with m columns
identical(t(t3), t4) ## must transpose to compare
#> [1] TRUE
t5 <- combinations(testVector1, m)
identical(t1, t5)
#> [1] TRUE
t6 <- gRbase::combnPrim(testVector1, m)
identical(t(t6)[do.call(order, as.data.frame(t(t6))),], t1)
#> [1] TRUE
Run Code Online (Sandbox Code Playgroud)
基准测试:
microbenchmark(cbRcppAlgos = comboGeneral(testVector1, m),
cbGRbase = gRbase::combnPrim(testVector1, m),
cbGtools = gtools::combinations(17, m, testVector1),
cbCombinat = combinat::combn(testVector1, m),
cbArrangements = combinations(17, m, testVector1),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.064 1.079 1.160 1.012 1.086 2.318 100
#> cbGRbase 7.335 7.509 5.728 6.807 5.390 1.608 100
#> cbGtools 426.536 408.807 240.101 310.848 187.034 63.663 100
#> cbCombinat 97.756 97.586 60.406 75.415 46.391 41.089 100
#> cbArrangements 1.000 1.000 1.000 1.000 1.000 1.000 100
Run Code Online (Sandbox Code Playgroud)
现在,我们一次检查选择m的替换组合.
RcppAlgosgtoolsarrangements如何:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(97)
testVector2 <- sort(rnorm(10))
m <- 8
t1 <- comboGeneral(testVector2, m, repetition = TRUE)
t3 <- gtools::combinations(10, m, testVector2, repeats.allowed = TRUE)
identical(t1, t3)
#> [1] TRUE
## arrangements
t4 <- combinations(testVector2, m, replace = TRUE)
identical(t1, t4)
#> [1] TRUE
Run Code Online (Sandbox Code Playgroud)
基准测试:
microbenchmark(cbRcppAlgos = comboGeneral(testVector2, m, TRUE),
cbGtools = gtools::combinations(10, m, testVector2, repeats.allowed = TRUE),
cbArrangements = combinations(testVector2, m, replace = TRUE),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.000 1.000 1.000 1.000 1.000 1.00000 100
#> cbGtools 384.990 269.683 80.027 112.170 102.432 3.67517 100
#> cbArrangements 1.057 1.116 0.618 1.052 1.002 0.03638 100
Run Code Online (Sandbox Code Playgroud)
首先,我们检查排列,而不是一次选择m.
RcppAlgosgtoolsarrangements如何:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(101)
testVector3 <- as.integer(c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29))
## RcppAlgos... permuteGeneral same as comboGeneral above
t1 <- permuteGeneral(testVector3, 6)
## gtools... permutations same as combinations above
t3 <- gtools::permutations(10, 6, testVector3)
identical(t1, t3)
#> [1] TRUE
## arrangements
t4 <- permutations(testVector3, 6)
identical(t1, t4)
#> [1] TRUE
Run Code Online (Sandbox Code Playgroud)
基准测试:
microbenchmark(cbRcppAlgos = permuteGeneral(testVector3, 6),
cbGtools = gtools::permutations(10, 6, testVector3),
cbArrangements = permutations(testVector3, 6),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.079 1.027 1.106 1.037 1.003 5.37 100
#> cbGtools 158.720 92.261 85.160 91.856 80.872 45.39 100
#> cbArrangements 1.000 1.000 1.000 1.000 1.000 1.00 100
Run Code Online (Sandbox Code Playgroud)
接下来,我们检查排列而不用一般向量替换(返回所有排列).
RcppAlgosgtoolscombinatmulticoolarrangements如何:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(89)
testVector3 <- as.integer(c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29))
testVector3Prime <- testVector3[1:7]
## For RcppAlgos, & gtools (see above)
## combinat
t4 <- combinat::permn(testVector3Prime) ## returns a list of vectors
## convert to a matrix
t4 <- do.call(rbind, t4)
## multicool.. we must first call initMC
t5 <- multicool::allPerm(multicool::initMC(testVector3Prime)) ## returns a matrix with n columns
all.equal(t4[do.call(order,as.data.frame(t4)),],
t5[do.call(order,as.data.frame(t5)),])
#> [1] TRUE
Run Code Online (Sandbox Code Playgroud)
基准测试:
microbenchmark(cbRcppAlgos = permuteGeneral(testVector3Prime, 7),
cbGtools = gtools::permutations(7, 7, testVector3Prime),
cbCombinat = combinat::permn(testVector3Prime),
cbMulticool = multicool::allPerm(multicool::initMC(testVector3Prime)),
cbArrangements = permutations(x = testVector3Prime, k = 7),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.152 1.275 0.7508 1.348 1.342 0.3159 100
#> cbGtools 965.465 817.645 340.4159 818.137 661.068 12.7042 100
#> cbCombinat 280.207 236.853 104.4777 238.228 208.467 9.6550 100
#> cbMulticool 2573.001 2109.246 851.3575 2039.531 1638.500 28.3597 100
#> cbArrangements 1.000 1.000 1.0000 1.000 1.000 1.0000 100
Run Code Online (Sandbox Code Playgroud)
现在,我们检查排列而不替换1:n(返回所有排列).
RcppAlgosgtoolscombinatmulticoolpartitionsarrangements如何:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(89)
t1 <- partitions::perms(7) ## returns an object of type 'partition' with n rows
identical(t(as.matrix(t1)), permutations(7,7))
#> [1] TRUE
Run Code Online (Sandbox Code Playgroud)
基准测试:
microbenchmark(cbRcppAlgos = permuteGeneral(7, 7),
cbGtools = gtools::permutations(7, 7),
cbCombinat = combinat::permn(7),
cbMulticool = multicool::allPerm(multicool::initMC(1:7)),
cbPartitions = partitions::perms(7),
cbArrangements = permutations(7, 7),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max
#> cbRcppAlgos 1.235 1.429 1.412 1.503 1.484 1.720
#> cbGtools 1152.826 1000.736 812.620 939.565 793.373 499.029
#> cbCombinat 347.446 304.866 260.294 296.521 248.343 284.001
#> cbMulticool 3001.517 2416.716 1903.903 2237.362 1811.006 1311.219
#> cbPartitions 2.469 2.536 2.801 2.692 2.999 2.472
#> cbArrangements 1.000 1.000 1.000 1.000 1.000 1.000
#> neval
#> 100
#> 100
#> 100
#> 100
#> 100
#> 100
Run Code Online (Sandbox Code Playgroud)
最后,我们检查替换的排列.
RcppAlgositerpcgtoolsarrangements如何:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(34)
testVector3 <- as.integer(c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29))
t1 <- permuteGeneral(testVector3, 5, repetition = TRUE)
t3 <- gtools::permutations(10, 5, testVector3, repeats.allowed = TRUE)
t4 <- permutations(x = testVector3, k = 5, replace = TRUE)
Run Code Online (Sandbox Code Playgroud)
This next benchmark is a little surprising given the results up until now.
microbenchmark(cbRcppAlgos = permuteGeneral(testVector3, 5, TRUE),
cbGtools = gtools::permutations(10, 5, testVector3, repeats.allowed = TRUE),
cbArrangements = permutations(x = testVector3, k = 5, replace = TRUE),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.106 0.9183 1.200 1.030 1.063 1.701 100
#> cbGtools 2.426 2.1815 2.068 1.996 2.127 1.367 100
#> cbArrangements 1.000 1.0000 1.000 1.000 1.000 1.000 100
Run Code Online (Sandbox Code Playgroud)
That is not a typo... gtools::permutations is almost as fast as the other compiled functions. I encourage the reader to go check out the source code of gtools::permutations as it is one of the most elegant displays of programming out there (R or otherwise).
First, we examine combinations of multisets.
RcppAlgosarrangementsTo find combinations/permutations of multisets, with RcppAlgos use the freqs arguments to specify how many times each element of the source vector, v, is repeated.
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(496)
myFreqs <- sample(1:5, 10, replace = TRUE)
## This is how many times each element will be repeated
myFreqs
#> [1] 2 4 4 5 3 2 2 2 3 4
testVector4 <- as.integer(c(1, 2, 3, 5, 8, 13, 21, 34, 55, 89))
t1 <- comboGeneral(testVector4, 12, freqs = myFreqs)
t3 <- combinations(freq = myFreqs, k = 12, x = testVector4)
identical(t1, t3)
#> [1] TRUE
Run Code Online (Sandbox Code Playgroud)
Benchmark:
microbenchmark(cbRcppAlgos = comboGeneral(testVector4, 12, freqs = myFreqs),
cbArrangements = combinations(freq = myFreqs, k = 12, x = testVector4),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.000 1.000 1.000 1.000 1.000 1.000 100
#> cbArrangements 1.254 1.221 1.287 1.259 1.413 1.173 100
Run Code Online (Sandbox Code Playgroud)
For permutations of multisets chosen m at a time, we have:
RcppAlgosarrangementsHow to:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(8128)
myFreqs <- sample(1:3, 5, replace = TRUE)
testVector5 <- sort(runif(5))
myFreqs
#> [1] 2 2 2 1 3
t1 <- permuteGeneral(testVector5, 7, freqs = myFreqs)
t3 <- permutations(freq = myFreqs, k = 7, x = testVector5)
identical(t1, t3)
#> [1] TRUE
Run Code Online (Sandbox Code Playgroud)
Benchmark:
microbenchmark(cbRcppAlgos = permuteGeneral(testVector5, 7, freqs = myFreqs),
cbArrangements = permutations(freq = myFreqs, k = 7, x = testVector5),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.461 1.327 1.282 1.177 1.176 1.101 100
#> cbArrangements 1.000 1.000 1.000 1.000 1.000 1.000 100
Run Code Online (Sandbox Code Playgroud)
For permutations of multisets returning all permutations, we have:
RcppAlgosmulticoolarrangementsHow to:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(8128)
myFreqs2 <- c(2,1,2,1,2)
testVector6 <- (1:5)^3
## For multicool, you must have the elements explicitly repeated
testVector6Prime <- rep(testVector6, times = myFreqs2)
t3 <- multicool::allPerm(multicool::initMC(testVector6Prime))
## for comparison
t1 <- permuteGeneral(testVector6, freqs = myFreqs2)
identical(t1[do.call(order,as.data.frame(t1)),],
t3[do.call(order,as.data.frame(t3)),])
#> [1] TRUE
Run Code Online (Sandbox Code Playgroud)
Benchmark:
microbenchmark(cbRcppAlgos = permuteGeneral(testVector6, freqs = myFreqs2),
cbMulticool = multicool::allPerm(multicool::initMC(testVector6Prime)),
cbArrangements = permutations(freq = myFreqs2, x = testVector6),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.276 1.374 1.119 1.461 1.39 0.8856 100
#> cbMulticool 2434.652 2135.862 855.946 2026.256 1521.74 31.0651 100
#> cbArrangements 1.000 1.000 1.000 1.000 1.00 1.0000 100
Run Code Online (Sandbox Code Playgroud)
Both gtools and combinat are well established packages for rearranging elements of a vector. With gtools there are a few more options (see the overview above) and with combinat, you can rearrange factors. With multicool, one is able to rearrange multisets. Although partitions and gRbase are limited for the purposes of this question, they are powerhouses packed with highly efficient functions for dealing with partitions and array objects respectively.
arrangementslayout argument (r = row-major, c = column-major, and l = list).collect & getnext when working with iterators.2^31 - 1 combinations/permutations via getnext. N.B. RcppAlgos (via lower/upper see below) and multicool (via nextPerm) are also capable of doing this.getnext, this function, allows for a specific number of results by utilizing the d argument.Observe:
library(arrangements)
icomb <- icombinations(1000, 7)
icomb$getnext(d = 5)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 2 3 4 5 6 7
#> [2,] 1 2 3 4 5 6 8
#> [3,] 1 2 3 4 5 6 9
#> [4,] 1 2 3 4 5 6 10
#> [5,] 1 2 3 4 5 6 11
Run Code Online (Sandbox Code Playgroud)
This feature is really nice when you only want a few combinations/permutations. With traditional methods, you would have to generate all combinations/permutations and then subset. This would render the previous example impossible as there are more than 10^17 results (i.e. ncombinations(1000, 7, bigz = TRUE) = 194280608456793000).
This feature along with the improvements to the generators in arrangements, allow it to be very efficient with respect to memory.
RcppAlgosupper (formally rowCap) that is analogous to the d argument of getnext.Observe:
library(RcppAlgos)
comboGeneral(1000, 7, upper = 5)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 2 3 4 5 6 7
#> [2,] 1 2 3 4 5 6 8
#> [3,] 1 2 3 4 5 6 9
#> [4,] 1 2 3 4 5 6 10
#> [5,] 1 2 3 4 5 6 11
Run Code Online (Sandbox Code Playgroud)
2.0.0, there is an argument called lower that allows one to start generation at a specific combination/permutation. This sets up nicely for parallelization and allows for fast generation beyond 2^31 - 1 as chunks are generated independently.Parallel example with more than 6 billion combinations:
system.time(parallel::mclapply(seq(1,6397478649,4390857), function(x) {
a <- comboGeneral(25, 15, freqs = c(rep(1:5, 5)), lower = x, upper = x + 4390856)
## do something
x
}, mc.cores = 7))
#> user system elapsed
#> 510.623 140.970 109.496
Run Code Online (Sandbox Code Playgroud)
In case you were wondering how each package scales, I will leave you with this final example that measures how fast each package can generate over 100 million results (N.B. gtools::combinations is left out here as it will throw the error: evaluation nested too deeply...). Also, we are explicitly calling combn from the utils package because I was unable to get a successful run from combinat::combn. The differences in memory usage between these two is quite bizarre given that they are only marginally different (see ?utils::combn under the "Authors" section).
Observe:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(2187)
testVector7 <- sort(sample(10^7, 10^3))
system.time(utils::combn(testVector7, 3))
#> user system elapsed
#> 179.956 5.687 187.159
system.time(RcppAlgos::comboGeneral(testVector7, 3))
#> user system elapsed
#> 1.136 0.758 1.937
system.time(arrangements::combinations(x = testVector7, k = 3))
#> user system elapsed
#> 1.963 0.930 2.910
system.time(RcppAlgos::permuteGeneral(testVector7[1:500], 3))
#> user system elapsed
#> 1.095 0.631 1.738
system.time(arrangements::permutations(x = testVector7[1:500], k = 3))
#> user system elapsed
#> 1.399 0.584 1.993
Run Code Online (Sandbox Code Playgroud)
When executing comboGeneral as well as arrangements::combinations, the memory will jump almost 2 Gbs before calling gc. This seems about right as #rows * #nols * bytesPerCell / 2^30 bytes = choose(1000,3) * 3 * 4 / 2^30 bytes = (166167000 * 3 * 4)/2^30 = 1.857 Gbs). However, when executing combn, the memory behavior was eratic (e.g. sometimes it would use all 16 Gb of memory and other times it would only spike a couple of Gbs). When I tested this on the Windows set-up, it would often crash.
We can confirm this using Rprof along with summaryRporf. Observe:
Rprof("RcppAlgos.out", memory.profiling = TRUE)
t1 <- RcppAlgos::comboGeneral(testVector7, 3)
Rprof(NULL)
summaryRprof("RcppAlgos.out", memory = "both")$by.total
total.time total.pct mem.total self.time self.pct
"CombinatoricsRcpp" 1.2 100 1901.6 1.2 100
"RcppAlgos::comboGeneral" 1.2 100 1901.6 0.0 0
Rprof("arrangements.out", memory.profiling = TRUE)
t3 <- arrangements::combinations(10^3, 3, testVector7)
Rprof(NULL)
summaryRprof("arrangements.out", memory = "both")$by.total
total.time total.pct mem.total self.time self.pct
".Call" 2.08 99.05 1901.6 2.08 99.05
Run Code Online (Sandbox Code Playgroud)
With RcppAlgos & arrangements, mem.total registers just over 1900 Mb.
And here is the memory profile on a smaller vector comparing gtools, utils, and combinat.
testVector7Prime <- testVector7[1:300]
Rprof("combinat.out", memory.profiling = TRUE)
t3 <- combinat::combn(testVector7Prime, 3)
Rprof(NULL)
summaryRprof("combinat.out", memory = "both")$by.total
total.time total.pct mem.total self.time self.pct
"combinat::combn" 3.98 100.00 1226.9 3.72 93.47
Rprof("utils.out", memory.profiling = TRUE)
t4 <- utils::combn(testVector7Prime, 3)
Rprof(NULL)
summaryRprof("utils.out", memory = "both")$by.total
total.time total.pct mem.total self.time self.pct
"utils::combn" 2.52 100.00 1952.7 2.50 99.21
Rprof("gtools.out", memory.profiling = TRUE)
t5 <- gtools::combinations(300, 3, testVector7Prime)
Rprof(NULL)
summaryRprof("gtools.out", memory = "both")$by.total
total.time total.pct mem.total self.time self.pct
"rbind" 4.94 95.00 6741.6 4.40 84.62
Run Code Online (Sandbox Code Playgroud)
Interestingly, utils::combn and combinat::combn use different amounts of memory and take different amounts of time to execute. This does not hold up with smaller vectors:
microbenchmark(combinat::combn(2:13, 6), utils::combn(2:13, 6))
Unit: microseconds
expr min lq mean median uq max neval
combinat::combn(2:13, 6) 527.378 567.946 629.1268 577.163 604.3270 1816.744 100
utils::combn(2:13, 6) 663.150 712.872 750.8008 725.716 771.1345 1205.697 100
Run Code Online (Sandbox Code Playgroud)
And with gtools the total memory used is a little over 3x as much as utils. It should be noted that for these 3 packages, I obtained different results every-time I ran them (e.g. for combinat::combn sometimes I would get 9000 Mb and then I would get 13000 Mb).
Still, none can match RcppAlgos OR arrangements. Both only use 51 Mb when ran on the example above.
benchmark script: https://gist.github.com/randy3k/bd5730a6d70101c7471f4ae6f453862e (rendered by https://github.com/tidyverse/reprex)
*: An homage to A Walk through Combinatorics by Miklós Bóna