我想置换一个向量,这样一个元素在排列之后就不能像在原始元素中那样在同一个地方.假设我有一个像这样的元素列表:AABBCCADEF
有效的洗牌是:BBAADEFCCA
但这些都是无效的:B A ACFEDCAB或BCA B FEDCAB
我能找到的最接近的答案是:python shuffle这样的位置永远不会重复.但这不是我想要的,因为在这个例子中没有重复的元素.
我想要一个快速算法,在重复的情况下推广该答案.
MWE:
library(microbenchmark)
set.seed(1)
x <- sample(letters, size=295, replace=T)
terrible_implementation <- function(x) {
xnew <- sample(x)
while(any(x == xnew)) {
xnew <- sample(x)
}
return(xnew)
}
microbenchmark(terrible_implementation(x), times=10)
Unit: milliseconds
expr min lq mean median uq max neval
terrible_implementation(x) 479.5338 2346.002 4738.49 2993.29 4858.254 17005.05 10
Run Code Online (Sandbox Code Playgroud)
另外,如何确定是否可以以这种方式置换序列?
编辑:为了清楚地说明我想要的东西,新的载体应满足以下条件:
1)all(table(newx) == table(x))
2)all(x != newx)
例如:
newx <- terrible_implementation(x)
all(table(newx) == table(x))
[1] TRUE
all(x != newx)
[1] TRUE
Run Code Online (Sandbox Code Playgroud)
我想这满足了你所有的条件。这个想法是按频率排序,从最常见的元素开始,然后按照最常见元素出现的次数将该值移动到频率表中的下一个值。这将保证所有元素都会被遗漏。
我用 编写data.table,因为它在调试过程中帮助了我,而不会损失太多性能。从性能角度来看,这是一个适度的改进。
library(data.table)
library(magrittr)
library(microbenchmark)
permute_avoid_same_position <- function(y) {
DT <- data.table(orig = y)
DT[, orig_order := .I]
count_by_letter <-
DT[, .N, keyby = orig] %>%
.[order(N)] %>%
.[, stable_order := .I] %>%
.[order(-stable_order)] %>%
.[]
out <- copy(DT)[count_by_letter, .(orig, orig_order, N), on = "orig"]
# Dummy element
out[, new := first(y)]
origs <- out[["orig"]]
nrow_out <- nrow(out)
maxN <- count_by_letter[["N"]][1]
out[seq_len(nrow_out) > maxN, new := head(origs, nrow_out - maxN)]
out[seq_len(nrow_out) <= maxN, new := tail(origs, maxN)]
DT[out, j = .(orig_order, orig, new), on = "orig_order"] %>%
.[order(orig_order)] %>%
.[["new"]]
}
set.seed(1)
x <- sample(letters, size=295, replace=T)
testthat::expect_true(all(table(permute_avoid_same_position(x)) == table(x)))
testthat::expect_true(all(x != permute_avoid_same_position(x)))
microbenchmark(permute_avoid_same_position(x), times = 5)
# Unit: milliseconds
# expr min lq mean median uq max
# permute_avoid_same_position(x) 5.650378 5.771753 5.875116 5.788618 5.938604 6.226228
x <- sample(1:1000, replace = TRUE, size = 1e6)
testthat::expect_true(all(table(permute_avoid_same_position(x)) == table(x)))
testthat::expect_true(all(x != permute_avoid_same_position(x)))
microbenchmark(permute_avoid_same_position(x), times = 5)
# Unit: milliseconds
# expr min lq mean median uq max
# permute_avoid_same_position(x) 239.7744 385.4686 401.521 438.2999 440.9746 503.0875
Run Code Online (Sandbox Code Playgroud)