生成连续出现的向量的所有组合被视为单次出现

Rah*_*ora 9 combinations r vector

我想生成具有向量元素的所有可能组合的向量,其中元素的连续多次出现被视为该元素的单次出现。

简单案例

对于 n = 2,

original <- c("a","a","a","b","b","b")
      v1 <- c("b","b","b","a","a","a")
Run Code Online (Sandbox Code Playgroud)

因此,所有唯一出现的 a 都会与 b 交换。

对于 n = 3,我们得到

original<-c("a","a","a","b","b","b","c","c","c")
    ver1<-c("a","a","a","c","c","c","b","b","b")
    ver2<-c("b","b","b","a","a","a","c","c","c")
    ver3<-c("b","b","b","c","c","c","a","a","a")
    ver4<-c("c","c","c","b","b","b","a","a","a")
    ver5<-c("c","c","c","a","a","a","b","b","b")
Run Code Online (Sandbox Code Playgroud)

因此,所有唯一出现的a交换与bc,所有唯一出现的b交换与acAND 所有唯一出现的c交换与ba

案例最多为 n = 10。(我相信不同组合的可能向量为 10!)

此外,可以有多个 a、b、c 块......

复杂情况

对于 n = 2;

original<-c("a","a","a","b","b","b","a","a","b","b")
    ver1<-c("b","b","b","a","a","a","b","b","a","a")
Run Code Online (Sandbox Code Playgroud)

但如果我们正确地交换元素,复杂情况和简单情况应该不重要。

我正在尝试:(n=2)

original<-c("a","a","a","b","b","b","a","a","b","b")
ver1<-replace(original,which(original=='a'),'b')
ver1<-replace(ver1,which(original=='b'),'a')
gives ver1<-c("b","b","b","a","a","a","b","b","a","a")
Run Code Online (Sandbox Code Playgroud)

但不确定如何自动化此操作。

Gre*_*gor 12

这是一种使用非常快速的排列包的方法arrangements。我们计算与输入的唯一元素相对应的整数排列,然后进行一些巧妙的索引来输出相应的交换。这在小型示例上速度非常快,并且在较大示例上表现得非常好 - 在我的计算机上,用了不到 7 秒的时间在大小10! = 3628800为 30 的输入上生成具有 10 个唯一元素的交换。结果可以方便地以list.

\n
library(arrangements)\n\nall_swaps = function(x) {\n  ux = unique(x)\n  xi = as.integer(factor(x))\n  perm = permutations(seq_along(ux))\n  apply(perm, MARGIN = 1, FUN = \\(p) ux[p][xi], simplify = FALSE)\n}\n
Run Code Online (Sandbox Code Playgroud)\n

问题中的测试用例:

\n
# n = 2\nall_swaps(c("a","a","a","b","b","b","a","a","b","b"))\n# [[1]]\n#  [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" "b"\n# \n# [[2]]\n#  [1] "b" "b" "b" "a" "a" "a" "b" "b" "a" "a"\n\n## n = 3\nall_swaps(c("a","a","a","b","b","b","c","c","c"))\n# [[1]]\n# [1] "a" "a" "a" "b" "b" "b" "c" "c" "c"\n# \n# [[2]]\n# [1] "a" "a" "a" "c" "c" "c" "b" "b" "b"\n# \n# [[3]]\n# [1] "b" "b" "b" "a" "a" "a" "c" "c" "c"\n# \n# [[4]]\n# [1] "b" "b" "b" "c" "c" "c" "a" "a" "a"\n# \n# [[5]]\n# [1] "c" "c" "c" "a" "a" "a" "b" "b" "b"\n# \n# [[6]]\n# [1] "c" "c" "c" "b" "b" "b" "a" "a" "a"\n
Run Code Online (Sandbox Code Playgroud)\n

一个较短的演示,在“复杂”情况下包含 3 个唯一元素,其中元素并非全部连续:

\n
all_swaps(c("a", "b", "b", "c", "b"))\n# [[1]]\n# [1] "a" "b" "b" "c" "b"\n# \n# [[2]]\n# [1] "a" "c" "c" "b" "c"\n# \n# [[3]]\n# [1] "b" "a" "a" "c" "a"\n# \n# [[4]]\n# [1] "b" "c" "c" "a" "c"\n# \n# [[5]]\n# [1] "c" "a" "a" "b" "a"\n# \n# [[6]]\n# [1] "c" "b" "b" "a" "b"\n
Run Code Online (Sandbox Code Playgroud)\n

一个更大的案例:

\n
# n = 10\nset.seed(47)\nstart_t = Sys.time()\nn10 = all_swaps(sample(letters[1:10], size = 30, replace = TRUE))\nend_t = Sys.time()\nend_t - start_t\n# Time difference of 6.711215 secs\nlength(n10)\n# [1] 3628800\n
Run Code Online (Sandbox Code Playgroud)\n
\n

标杆管理

\n

将我的答案与 Ma\xc3\xabl 和 ThomasIsCoding 进行基准测试,我依赖该arrangements包的方法快速且内存高效。pracma::permsThomasIsCoding 的答案可以通过从 改为来改进arrangements::permutations——内存使用量特别改善——但我的版本仍然表现更好。Ma\xc3\xabl 使用大量时间和内存。我将带出结果,重现的代码如下。

\n
## 5 Unique Elements\narrange(b5, desc(`itr/sec`))\n# # A tibble: 4 \xc3\x97 13\n#   expression                  min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time\n#   <bch:expr>             <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>\n# 1 GregorThomas             2.31ms   12.6ms     77.5     5.77KB        0    40     0      516ms\n# 2 ThomasIsCodingArr(in5)    9.3ms   20.5ms     47.4    19.55KB        0    24     0      506ms\n# 3 ThomasIsCoding(in5)     12.57ms   22.7ms     41.2    45.41KB        0    22     0      534ms\n# 4 Mael                   963.64ms  963.6ms      1.04    1.24MB        0     1     0      964ms\n# # \xe2\x80\xa6 with 4 more variables: result <list>, memory <list>, time <list>, gc <list>\n\n## 9 Unique Elements - memory allocation is important\narrange(b9, desc(`itr/sec`))\n# # A tibble: 2 \xc3\x97 13\n#   expression               min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result\n#   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>\n# 1 GregorThomas            1.8s     1.8s     0.556    27.7MB    0         1     0       1.8s <NULL>\n# 2 ThomasIsCoding(in9)     2.5s     2.5s     0.400   230.8MB    0.400     1     1       2.5s <NULL>\n# # \xe2\x80\xa6 with 3 more variables: memory <list>, time <list>, gc <list>\n
Run Code Online (Sandbox Code Playgroud)\n

基准测试代码:

\n
## Functions\nlibrary(arrangements)\nlibrary(pracma)\nThomasIsCoding <- function(x) {\n  idx <- match(x, unique(x))\n  m <- asplit(matrix(unique(x)[perms(1:max(idx))], ncol = max(idx)), 1)\n  Map(`[`, m, list(idx))\n}\nThomasIsCodingArr <- function(x) {\n  idx <- match(x, unique(x))\n  m <- asplit(matrix(unique(x)[permutations(1:max(idx))], ncol = max(idx)), 1)\n  Map(`[`, m, list(idx))\n}\nMael <- function(vec){\n  uni <- unique(vec)\n  size <- length(uni)\n  pVec <- paste(uni, collapse = "")\n  grid <- expand.grid(rep(list(uni), size))\n  expanded <- grid[apply(grid, 1, function(x) length(unique(x))) == size,]\n  p <- unname(apply(expanded, 1, paste0, collapse = ""))\n  \n  lapply(p, function(x) chartr(pVec, x, vec))\n}\nall_swaps = function(x) {\n  ux = unique(x)\n  xi = as.integer(factor(x))\n  perm = permutations(seq_along(ux))\n  apply(perm, MARGIN = 1, FUN = \\(p) ux[p][xi], simplify = FALSE)\n}\n\nset.seed(47)\nin5 = c(sample(letters[1:5], 5), sample(letters[1:5], 5, replace = TRUE))\n\nb5 = bench::mark(\n  GregorThomas = all_swaps(in5),\n  Mael = Mael(in5),\n  ThomasIsCoding(in5),\n  ThomasIsCodingArr(in5),\n  check = FALSE\n)\n
Run Code Online (Sandbox Code Playgroud)\n

  • (这里严格进入代码高尔夫领域,但函数体可以归结为 `f &lt;- Factor(x); l &lt;-levels(f); p &lt;- permutations(l); apply(p, 1L, "[" , f)`——我主要分享这个是因为我认为这种形式有一些很好的对称性,如果你碰巧关心代码中的美观:D) (5认同)
  • 谢谢!我认为这样的东西会起作用,我顺着线索走过去,它非常优雅地组合在一起:) (2认同)

Tho*_*ing 6

更新

\n

在这里,我们对之前的答案做了一些改进,其中结果存储在matrix(而不是list)中,并arrangement::permuations应用(而不是(感谢@Gregor Thomaspracma::perms的推荐)

\n
f_TIC2 <- function(x) {\n  u <- unique(x)\n  idx <- match(x, u)\n  n <- max(idx)\n  m <- matrix(u[perms(1:n)], ncol = n)\n  matrix(t(m)[c(outer(idx, (0:(nrow(m) - 1)) * ncol(m), `+`))], nrow = nrow(m), byrow = TRUE)\n}\n\nf_TIC2Arr <- function(x) {\n  u <- unique(x)\n  idx <- match(x, u)\n  n <- max(idx)\n  m <- matrix(u[permutations(1:n)], ncol = n)\n  matrix(t(m)[c(outer(idx, (0:(nrow(m) - 1)) * ncol(m), `+`))], nrow = nrow(m), byrow = TRUE)\n}\n
Run Code Online (Sandbox Code Playgroud)\n

输出看起来像

\n
> f_TIC2(c("a", "b", "b", "c", "b"))\n     [,1] [,2] [,3] [,4] [,5]\n[1,] "c"  "b"  "b"  "a"  "b"\n[2,] "c"  "a"  "a"  "b"  "a"\n[3,] "b"  "c"  "c"  "a"  "c"\n[4,] "b"  "a"  "a"  "c"  "a"\n[5,] "a"  "b"  "b"  "c"  "b"\n[6,] "a"  "c"  "c"  "b"  "c"\n\n> f_TIC2Arr(c("a", "b", "b", "c", "b"))\n     [,1] [,2] [,3] [,4] [,5]\n[1,] "a"  "b"  "b"  "c"  "b"\n[2,] "a"  "c"  "c"  "b"  "c"\n[3,] "b"  "a"  "a"  "c"  "a"\n[4,] "b"  "c"  "c"  "a"  "c"\n[5,] "c"  "a"  "a"  "b"  "a"\n[6,] "c"  "b"  "b"  "a"  "b"\n
Run Code Online (Sandbox Code Playgroud)\n

标杆管理

\n

这是一些现有答案的基准(Ma\xc3\xabl\ 的解决方案计算量很大,因此被跳过。)

\n

注意:这个基准并不是100 % 公平,因为我改进的解决方案产生矩阵而不是列表,这节省了大量时间。因此,比较并不是说我的速度最快,而是指出了提高性能的可能方法

\n
library(RcppAlgos)\nlibrary(arrangements)\nlibrary(pracma)\nf_TIC1 <- function(x) {\n  idx <- match(x, unique(x))\n  m <- asplit(matrix(unique(x)[perms(1:max(idx))], ncol = max(idx)), 1)\n  Map(`[`, m, list(idx))\n}\nf_TIC1Arr <- function(x) {\n  idx <- match(x, unique(x))\n  m <- asplit(matrix(unique(x)[permutations(1:max(idx))], ncol = max(idx)), 1)\n  Map(`[`, m, list(idx))\n}\nf_TIC2 <- function(x) {\n  u <- unique(x)\n  idx <- match(x, u)\n  n <- max(idx)\n  m <- matrix(u[perms(1:n)], ncol = n)\n  matrix(t(m)[outer(idx, (0:(nrow(m) - 1)) * ncol(m), `+`)], nrow = nrow(m), byrow = TRUE)\n}\n\nf_TIC2Arr <- function(x) {\n  u <- unique(x)\n  idx <- match(x, u)\n  n <- max(idx)\n  m <- matrix(u[permutations(1:n)], ncol = n)\n  matrix(t(m)[outer(idx, (0:(nrow(m) - 1)) * ncol(m), `+`)], nrow = nrow(m), byrow = TRUE)\n}\n\nf_GT <- function(x) {\n  ux <- unique(x)\n  xi <- as.integer(factor(x))\n  perm <- permutations(seq_along(ux))\n  apply(perm, MARGIN = 1, FUN = \\(p) ux[p][xi], simplify = FALSE)\n}\n\nf_RS <- function(x) {\n  permuteGeneral(uv <- unique(x), length(uv), FUN = \\(m) uv[match(x, m)])\n}\n\nset.seed(1)\nx <- sample(letters[1:10], 10, replace = TRUE)\n\nbm <- bench::mark(\n  f_GT = f_GT(x),\n  f_TIC1 = f_TIC1(x),\n  f_TIC1Arr = f_TIC1Arr(x),\n  f_TIC2 = f_TIC2(x),\n  f_TIC2Arr = f_TIC2Arr(x),\n  f_RS = f_RS(x),\n  check = FALSE\n)\nautoplot(bm)\n
Run Code Online (Sandbox Code Playgroud)\n

你会看到

\n
> bm\n# A tibble: 6 x 13\n  expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time\n  <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>\n1 f_GT       11.55ms 15.57ms      58.9  315.14KB     7.06    25     3      425ms\n2 f_TIC1     17.05ms  20.8ms      45.5    2.58MB    10.1     18     4      396ms\n3 f_TIC1Arr  16.45ms 19.62ms      48.9    1.06MB    13.6     18     5      368ms\n4 f_TIC2      2.47ms  3.31ms     259.     3.84MB    28.5     91    10      351ms\n5 f_TIC2Arr   1.54ms   1.7ms     469.     2.35MB    26.2    197    11      420ms\n6 f_RS        5.66ms  7.46ms      93.9   72.75KB     9.63    39     4      415ms\n# ... with 4 more variables: result <list>, memory <list>, time <list>,\n#   gc <list>\n
Run Code Online (Sandbox Code Playgroud)\n

\n

在此输入图像描述

\n
\n

上一个答案

\n

pracma::perms你可以像下面这样尝试

\n
library(pracma)\nf <- function(x) {\n  idx <- match(x, unique(x))\n  m <- asplit(matrix(unique(x)[perms(1:max(idx))], ncol = max(idx)), 1)\n  Map(`[`, m, list(idx))\n}\n
Run Code Online (Sandbox Code Playgroud)\n

你会看到

\n
> f(c("a", "a", "a", "b", "b", "b", "a", "a", "b", "b"))\n[[1]]\n [1] "b" "b" "b" "a" "a" "a" "b" "b" "a" "a"\n\n[[2]]\n [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" "b"\n\n\n> f(c("a", "b", "b", "c", "b"))\n[[1]]\n[1] "c" "b" "b" "a" "b"\n\n[[2]]\n[1] "c" "a" "a" "b" "a"\n\n[[3]]\n[1] "b" "c" "c" "a" "c"\n\n[[4]]\n[1] "b" "a" "a" "c" "a"\n\n[[5]]\n[1] "a" "b" "b" "c" "b"\n\n[[6]]\n[1] "a" "c" "c" "b" "c"\n
Run Code Online (Sandbox Code Playgroud)\n


Maë*_*aël 5

使用chartr,您可以执行以下操作(尽管这对于较大的向量可能会崩溃):

f <- function(vec){
  uni <- unique(vec)
  size <- length(uni)
  pVec <- paste(uni, collapse = "")
  grid <- expand.grid(rep(list(uni), size))
  expanded <- grid[apply(grid, 1, function(x) length(unique(x))) == size,]
  p <- unname(apply(ex, 1, paste0, collapse = ""))
  
  lapply(p, function(x) chartr(pVec, x, vec))
}
Run Code Online (Sandbox Code Playgroud)

输出:

original<-c("a","a","a","b","b","b","c","c","c")
f(original)

# [[1]]
# [1] "c" "c" "c" "b" "b" "b" "a" "a" "a"
# 
# [[2]]
# [1] "b" "b" "b" "c" "c" "c" "a" "a" "a"
# 
# [[3]]
# [1] "c" "c" "c" "a" "a" "a" "b" "b" "b"
# 
# [[4]]
# [1] "a" "a" "a" "c" "c" "c" "b" "b" "b"
# 
# [[5]]
# [1] "b" "b" "b" "a" "a" "a" "c" "c" "c"
# 
# [[6]]
# [1] "a" "a" "a" "b" "b" "b" "c" "c" "c"
Run Code Online (Sandbox Code Playgroud)

先前的答案(不适用于 n > 2)。

使用gtools::permutations。结果是矩阵的每一列。这个想法是从唯一值中获取排列,并重复这些值以匹配所需的组长度。

f <- function(x){
  r <- rle(x)
  l <- length(r$values)
  apply(gtools::permutations(n=l, r=l, v=r$values), 1, function(x) rep(x, each = unique(r$l)))
}
Run Code Online (Sandbox Code Playgroud)

  • 这不适用于复杂的情况,例如上面提到的 n=2 Original&lt;-c("a","a","a","b","b","b","a"," a","b","b"),或不等长原始&lt;-c("a","a","a","b","b","b","c","c ”) (2认同)

H 1*_*H 1 5

该答案采用与已发布的相同的通用方法,但使用的RcppAlgos::permuteGeneral()方法不仅非常快,而且还允许将函数应用于排列。

library(RcppAlgos)

f <- function(x) permuteGeneral(uv <- unique(x), length(uv), FUN = \(m) uv[match(x, m)])

f(original)
[[1]]
[1] "a" "a" "a" "b" "b" "b" "c" "c" "c"

[[2]]
[1] "a" "a" "a" "c" "c" "c" "b" "b" "b"

[[3]]
[1] "b" "b" "b" "a" "a" "a" "c" "c" "c"

[[4]]
[1] "c" "c" "c" "a" "a" "a" "b" "b" "b"

[[5]]
[1] "b" "b" "b" "c" "c" "c" "a" "a" "a"

[[6]]
[1] "c" "c" "c" "b" "b" "b" "a" "a" "a"
Run Code Online (Sandbox Code Playgroud)

  • 感人的!多么强大的`permuteGeneral`!已投票! (2认同)