Rah*_*ora 9 combinations r vector
我想生成具有向量元素的所有可能组合的向量,其中元素的连续多次出现被视为该元素的单次出现。
简单案例
对于 n = 2,
original <- c("a","a","a","b","b","b")
v1 <- c("b","b","b","a","a","a")
Run Code Online (Sandbox Code Playgroud)
因此,所有唯一出现的 a 都会与 b 交换。
对于 n = 3,我们得到
original<-c("a","a","a","b","b","b","c","c","c")
ver1<-c("a","a","a","c","c","c","b","b","b")
ver2<-c("b","b","b","a","a","a","c","c","c")
ver3<-c("b","b","b","c","c","c","a","a","a")
ver4<-c("c","c","c","b","b","b","a","a","a")
ver5<-c("c","c","c","a","a","a","b","b","b")
Run Code Online (Sandbox Code Playgroud)
因此,所有唯一出现的a交换与b和c,所有唯一出现的b交换与a和cAND 所有唯一出现的c交换与b和a。
案例最多为 n = 10。(我相信不同组合的可能向量为 10!)
此外,可以有多个 a、b、c 块......
复杂情况
对于 n = 2;
original<-c("a","a","a","b","b","b","a","a","b","b")
ver1<-c("b","b","b","a","a","a","b","b","a","a")
Run Code Online (Sandbox Code Playgroud)
但如果我们正确地交换元素,复杂情况和简单情况应该不重要。
我正在尝试:(n=2)
original<-c("a","a","a","b","b","b","a","a","b","b")
ver1<-replace(original,which(original=='a'),'b')
ver1<-replace(ver1,which(original=='b'),'a')
gives ver1<-c("b","b","b","a","a","a","b","b","a","a")
Run Code Online (Sandbox Code Playgroud)
但不确定如何自动化此操作。
Gre*_*gor 12
这是一种使用非常快速的排列包的方法arrangements。我们计算与输入的唯一元素相对应的整数排列,然后进行一些巧妙的索引来输出相应的交换。这在小型示例上速度非常快,并且在较大示例上表现得非常好 - 在我的计算机上,用了不到 7 秒的时间在大小10! = 3628800为 30 的输入上生成具有 10 个唯一元素的交换。结果可以方便地以list.
library(arrangements)\n\nall_swaps = function(x) {\n ux = unique(x)\n xi = as.integer(factor(x))\n perm = permutations(seq_along(ux))\n apply(perm, MARGIN = 1, FUN = \\(p) ux[p][xi], simplify = FALSE)\n}\nRun Code Online (Sandbox Code Playgroud)\n问题中的测试用例:
\n# n = 2\nall_swaps(c("a","a","a","b","b","b","a","a","b","b"))\n# [[1]]\n# [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" "b"\n# \n# [[2]]\n# [1] "b" "b" "b" "a" "a" "a" "b" "b" "a" "a"\n\n## n = 3\nall_swaps(c("a","a","a","b","b","b","c","c","c"))\n# [[1]]\n# [1] "a" "a" "a" "b" "b" "b" "c" "c" "c"\n# \n# [[2]]\n# [1] "a" "a" "a" "c" "c" "c" "b" "b" "b"\n# \n# [[3]]\n# [1] "b" "b" "b" "a" "a" "a" "c" "c" "c"\n# \n# [[4]]\n# [1] "b" "b" "b" "c" "c" "c" "a" "a" "a"\n# \n# [[5]]\n# [1] "c" "c" "c" "a" "a" "a" "b" "b" "b"\n# \n# [[6]]\n# [1] "c" "c" "c" "b" "b" "b" "a" "a" "a"\nRun Code Online (Sandbox Code Playgroud)\n一个较短的演示,在“复杂”情况下包含 3 个唯一元素,其中元素并非全部连续:
\nall_swaps(c("a", "b", "b", "c", "b"))\n# [[1]]\n# [1] "a" "b" "b" "c" "b"\n# \n# [[2]]\n# [1] "a" "c" "c" "b" "c"\n# \n# [[3]]\n# [1] "b" "a" "a" "c" "a"\n# \n# [[4]]\n# [1] "b" "c" "c" "a" "c"\n# \n# [[5]]\n# [1] "c" "a" "a" "b" "a"\n# \n# [[6]]\n# [1] "c" "b" "b" "a" "b"\nRun Code Online (Sandbox Code Playgroud)\n一个更大的案例:
\n# n = 10\nset.seed(47)\nstart_t = Sys.time()\nn10 = all_swaps(sample(letters[1:10], size = 30, replace = TRUE))\nend_t = Sys.time()\nend_t - start_t\n# Time difference of 6.711215 secs\nlength(n10)\n# [1] 3628800\nRun Code Online (Sandbox Code Playgroud)\n将我的答案与 Ma\xc3\xabl 和 ThomasIsCoding 进行基准测试,我依赖该arrangements包的方法快速且内存高效。pracma::permsThomasIsCoding 的答案可以通过从 改为来改进arrangements::permutations——内存使用量特别改善——但我的版本仍然表现更好。Ma\xc3\xabl 使用大量时间和内存。我将带出结果,重现的代码如下。
## 5 Unique Elements\narrange(b5, desc(`itr/sec`))\n# # A tibble: 4 \xc3\x97 13\n# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time\n# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>\n# 1 GregorThomas 2.31ms 12.6ms 77.5 5.77KB 0 40 0 516ms\n# 2 ThomasIsCodingArr(in5) 9.3ms 20.5ms 47.4 19.55KB 0 24 0 506ms\n# 3 ThomasIsCoding(in5) 12.57ms 22.7ms 41.2 45.41KB 0 22 0 534ms\n# 4 Mael 963.64ms 963.6ms 1.04 1.24MB 0 1 0 964ms\n# # \xe2\x80\xa6 with 4 more variables: result <list>, memory <list>, time <list>, gc <list>\n\n## 9 Unique Elements - memory allocation is important\narrange(b9, desc(`itr/sec`))\n# # A tibble: 2 \xc3\x97 13\n# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result\n# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list>\n# 1 GregorThomas 1.8s 1.8s 0.556 27.7MB 0 1 0 1.8s <NULL>\n# 2 ThomasIsCoding(in9) 2.5s 2.5s 0.400 230.8MB 0.400 1 1 2.5s <NULL>\n# # \xe2\x80\xa6 with 3 more variables: memory <list>, time <list>, gc <list>\nRun Code Online (Sandbox Code Playgroud)\n基准测试代码:
\n## Functions\nlibrary(arrangements)\nlibrary(pracma)\nThomasIsCoding <- function(x) {\n idx <- match(x, unique(x))\n m <- asplit(matrix(unique(x)[perms(1:max(idx))], ncol = max(idx)), 1)\n Map(`[`, m, list(idx))\n}\nThomasIsCodingArr <- function(x) {\n idx <- match(x, unique(x))\n m <- asplit(matrix(unique(x)[permutations(1:max(idx))], ncol = max(idx)), 1)\n Map(`[`, m, list(idx))\n}\nMael <- function(vec){\n uni <- unique(vec)\n size <- length(uni)\n pVec <- paste(uni, collapse = "")\n grid <- expand.grid(rep(list(uni), size))\n expanded <- grid[apply(grid, 1, function(x) length(unique(x))) == size,]\n p <- unname(apply(expanded, 1, paste0, collapse = ""))\n \n lapply(p, function(x) chartr(pVec, x, vec))\n}\nall_swaps = function(x) {\n ux = unique(x)\n xi = as.integer(factor(x))\n perm = permutations(seq_along(ux))\n apply(perm, MARGIN = 1, FUN = \\(p) ux[p][xi], simplify = FALSE)\n}\n\nset.seed(47)\nin5 = c(sample(letters[1:5], 5), sample(letters[1:5], 5, replace = TRUE))\n\nb5 = bench::mark(\n GregorThomas = all_swaps(in5),\n Mael = Mael(in5),\n ThomasIsCoding(in5),\n ThomasIsCodingArr(in5),\n check = FALSE\n)\nRun Code Online (Sandbox Code Playgroud)\n
在这里,我们对之前的答案做了一些改进,其中结果存储在matrix(而不是list)中,并arrangement::permuations应用(而不是(感谢@Gregor Thomaspracma::perms的推荐)
f_TIC2 <- function(x) {\n u <- unique(x)\n idx <- match(x, u)\n n <- max(idx)\n m <- matrix(u[perms(1:n)], ncol = n)\n matrix(t(m)[c(outer(idx, (0:(nrow(m) - 1)) * ncol(m), `+`))], nrow = nrow(m), byrow = TRUE)\n}\n\nf_TIC2Arr <- function(x) {\n u <- unique(x)\n idx <- match(x, u)\n n <- max(idx)\n m <- matrix(u[permutations(1:n)], ncol = n)\n matrix(t(m)[c(outer(idx, (0:(nrow(m) - 1)) * ncol(m), `+`))], nrow = nrow(m), byrow = TRUE)\n}\nRun Code Online (Sandbox Code Playgroud)\n输出看起来像
\n> f_TIC2(c("a", "b", "b", "c", "b"))\n [,1] [,2] [,3] [,4] [,5]\n[1,] "c" "b" "b" "a" "b"\n[2,] "c" "a" "a" "b" "a"\n[3,] "b" "c" "c" "a" "c"\n[4,] "b" "a" "a" "c" "a"\n[5,] "a" "b" "b" "c" "b"\n[6,] "a" "c" "c" "b" "c"\n\n> f_TIC2Arr(c("a", "b", "b", "c", "b"))\n [,1] [,2] [,3] [,4] [,5]\n[1,] "a" "b" "b" "c" "b"\n[2,] "a" "c" "c" "b" "c"\n[3,] "b" "a" "a" "c" "a"\n[4,] "b" "c" "c" "a" "c"\n[5,] "c" "a" "a" "b" "a"\n[6,] "c" "b" "b" "a" "b"\nRun Code Online (Sandbox Code Playgroud)\n这是一些现有答案的基准(Ma\xc3\xabl\ 的解决方案计算量很大,因此被跳过。)
\n注意:这个基准并不是100 % 公平,因为我改进的解决方案产生矩阵而不是列表,这节省了大量时间。因此,比较并不是说我的速度最快,而是指出了提高性能的可能方法。
\nlibrary(RcppAlgos)\nlibrary(arrangements)\nlibrary(pracma)\nf_TIC1 <- function(x) {\n idx <- match(x, unique(x))\n m <- asplit(matrix(unique(x)[perms(1:max(idx))], ncol = max(idx)), 1)\n Map(`[`, m, list(idx))\n}\nf_TIC1Arr <- function(x) {\n idx <- match(x, unique(x))\n m <- asplit(matrix(unique(x)[permutations(1:max(idx))], ncol = max(idx)), 1)\n Map(`[`, m, list(idx))\n}\nf_TIC2 <- function(x) {\n u <- unique(x)\n idx <- match(x, u)\n n <- max(idx)\n m <- matrix(u[perms(1:n)], ncol = n)\n matrix(t(m)[outer(idx, (0:(nrow(m) - 1)) * ncol(m), `+`)], nrow = nrow(m), byrow = TRUE)\n}\n\nf_TIC2Arr <- function(x) {\n u <- unique(x)\n idx <- match(x, u)\n n <- max(idx)\n m <- matrix(u[permutations(1:n)], ncol = n)\n matrix(t(m)[outer(idx, (0:(nrow(m) - 1)) * ncol(m), `+`)], nrow = nrow(m), byrow = TRUE)\n}\n\nf_GT <- function(x) {\n ux <- unique(x)\n xi <- as.integer(factor(x))\n perm <- permutations(seq_along(ux))\n apply(perm, MARGIN = 1, FUN = \\(p) ux[p][xi], simplify = FALSE)\n}\n\nf_RS <- function(x) {\n permuteGeneral(uv <- unique(x), length(uv), FUN = \\(m) uv[match(x, m)])\n}\n\nset.seed(1)\nx <- sample(letters[1:10], 10, replace = TRUE)\n\nbm <- bench::mark(\n f_GT = f_GT(x),\n f_TIC1 = f_TIC1(x),\n f_TIC1Arr = f_TIC1Arr(x),\n f_TIC2 = f_TIC2(x),\n f_TIC2Arr = f_TIC2Arr(x),\n f_RS = f_RS(x),\n check = FALSE\n)\nautoplot(bm)\nRun Code Online (Sandbox Code Playgroud)\n你会看到
\n> bm\n# A tibble: 6 x 13\n expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time\n <bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>\n1 f_GT 11.55ms 15.57ms 58.9 315.14KB 7.06 25 3 425ms\n2 f_TIC1 17.05ms 20.8ms 45.5 2.58MB 10.1 18 4 396ms\n3 f_TIC1Arr 16.45ms 19.62ms 48.9 1.06MB 13.6 18 5 368ms\n4 f_TIC2 2.47ms 3.31ms 259. 3.84MB 28.5 91 10 351ms\n5 f_TIC2Arr 1.54ms 1.7ms 469. 2.35MB 26.2 197 11 420ms\n6 f_RS 5.66ms 7.46ms 93.9 72.75KB 9.63 39 4 415ms\n# ... with 4 more variables: result <list>, memory <list>, time <list>,\n# gc <list>\nRun Code Online (Sandbox Code Playgroud)\n和
\n\npracma::perms你可以像下面这样尝试
library(pracma)\nf <- function(x) {\n idx <- match(x, unique(x))\n m <- asplit(matrix(unique(x)[perms(1:max(idx))], ncol = max(idx)), 1)\n Map(`[`, m, list(idx))\n}\nRun Code Online (Sandbox Code Playgroud)\n你会看到
\n> f(c("a", "a", "a", "b", "b", "b", "a", "a", "b", "b"))\n[[1]]\n [1] "b" "b" "b" "a" "a" "a" "b" "b" "a" "a"\n\n[[2]]\n [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" "b"\n\n\n> f(c("a", "b", "b", "c", "b"))\n[[1]]\n[1] "c" "b" "b" "a" "b"\n\n[[2]]\n[1] "c" "a" "a" "b" "a"\n\n[[3]]\n[1] "b" "c" "c" "a" "c"\n\n[[4]]\n[1] "b" "a" "a" "c" "a"\n\n[[5]]\n[1] "a" "b" "b" "c" "b"\n\n[[6]]\n[1] "a" "c" "c" "b" "c"\nRun Code Online (Sandbox Code Playgroud)\n
使用chartr,您可以执行以下操作(尽管这对于较大的向量可能会崩溃):
f <- function(vec){
uni <- unique(vec)
size <- length(uni)
pVec <- paste(uni, collapse = "")
grid <- expand.grid(rep(list(uni), size))
expanded <- grid[apply(grid, 1, function(x) length(unique(x))) == size,]
p <- unname(apply(ex, 1, paste0, collapse = ""))
lapply(p, function(x) chartr(pVec, x, vec))
}
Run Code Online (Sandbox Code Playgroud)
输出:
original<-c("a","a","a","b","b","b","c","c","c")
f(original)
# [[1]]
# [1] "c" "c" "c" "b" "b" "b" "a" "a" "a"
#
# [[2]]
# [1] "b" "b" "b" "c" "c" "c" "a" "a" "a"
#
# [[3]]
# [1] "c" "c" "c" "a" "a" "a" "b" "b" "b"
#
# [[4]]
# [1] "a" "a" "a" "c" "c" "c" "b" "b" "b"
#
# [[5]]
# [1] "b" "b" "b" "a" "a" "a" "c" "c" "c"
#
# [[6]]
# [1] "a" "a" "a" "b" "b" "b" "c" "c" "c"
Run Code Online (Sandbox Code Playgroud)
先前的答案(不适用于 n > 2)。
使用gtools::permutations。结果是矩阵的每一列。这个想法是从唯一值中获取排列,并重复这些值以匹配所需的组长度。
f <- function(x){
r <- rle(x)
l <- length(r$values)
apply(gtools::permutations(n=l, r=l, v=r$values), 1, function(x) rep(x, each = unique(r$l)))
}
Run Code Online (Sandbox Code Playgroud)
该答案采用与已发布的相同的通用方法,但使用的RcppAlgos::permuteGeneral()方法不仅非常快,而且还允许将函数应用于排列。
library(RcppAlgos)
f <- function(x) permuteGeneral(uv <- unique(x), length(uv), FUN = \(m) uv[match(x, m)])
f(original)
[[1]]
[1] "a" "a" "a" "b" "b" "b" "c" "c" "c"
[[2]]
[1] "a" "a" "a" "c" "c" "c" "b" "b" "b"
[[3]]
[1] "b" "b" "b" "a" "a" "a" "c" "c" "c"
[[4]]
[1] "c" "c" "c" "a" "a" "a" "b" "b" "b"
[[5]]
[1] "b" "b" "b" "c" "c" "c" "a" "a" "a"
[[6]]
[1] "c" "c" "c" "b" "b" "b" "a" "a" "a"
Run Code Online (Sandbox Code Playgroud)