sca*_*der 0 r permutation combinatorics
我有以下生成唯一排列的代码:
library(magrittr)
library(tictoc)
count_unique_perm <- function(l = NULL) {
lo <- combinat::permn(l)
do.call(rbind, lapply(lo, paste0, collapse = ""))[, 1] %>%
unique() %>%
length()
}
Run Code Online (Sandbox Code Playgroud)
它已经给出了正确的结果。通过此输入:
l1 <- c("R", "R", "R", "R", "R", "R", "R", "E", "K", "P") # 720
l2 <- c("R", "R", "R", "R", "R", "Q", "G", "K", "M", "S") # 30,240
Run Code Online (Sandbox Code Playgroud)
但它运行速度非常慢。
tic()
count_unique_perm(l = l1)
toc()
#118.155 sec elapsed
#107.793 sec elapsed for l2
Run Code Online (Sandbox Code Playgroud)
我怎样才能加快速度?
您不需要生成排列,有封闭公式。您可以使用包iterpc:
iterpc::multichoose(table(l1))
Run Code Online (Sandbox Code Playgroud)
或在基地:
factorial(length(l2)) / prod(factorial(table(l2)))
Run Code Online (Sandbox Code Playgroud)
尝试该RcppAlgos包,它将使用freqs参数返回多重集的排列。
library(RcppAlgos)
library(microbenchmark)
# get a matrix of unique permutations
x <- table(c("R", "R", "R", "R", "R", "R", "R", "E", "K", "P"))
y <- table(c("R", "R", "R", "R", "R", "Q", "G", "K", "M", "S"))
microbenchmark(permx = permuteGeneral(names(x), freqs = x),
permy = permuteGeneral(names(y), freqs = y))
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> permx 32.3 38.0 44.018 42.05 47.95 64.8 100
#> permy 1538.8 1567.7 1751.259 1606.60 1649.35 5082.5 100
dim(permuteGeneral(names(x), freqs = x))
#> [1] 720 10
dim(permuteGeneral(names(y), freqs = y))
#> [1] 30240 10
Run Code Online (Sandbox Code Playgroud)
要仅获取唯一排列的数量,请使用permuteCount。
microbenchmark(permx = permuteCount(names(x), freqs = x),
permy = permuteCount(names(y), freqs = y))
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> permx 1.5 1.6 1.791 1.6 1.8 6.6 100
#> permy 1.5 1.6 2.260 1.7 1.8 46.2 100
permuteCount(names(x), freqs = x)
#> [1] 720
permuteCount(names(y), freqs = y)
#> [1] 30240
Run Code Online (Sandbox Code Playgroud)