该R函数expand.grid返回所提供参数的元素之间的所有可能组合.例如
> expand.grid(c("aa", "ab", "cc"), c("aa", "ab", "cc"))
Var1 Var2
1 aa aa
2 ab aa
3 cc aa
4 aa ab
5 ab ab
6 cc ab
7 aa cc
8 ab cc
9 cc cc
Run Code Online (Sandbox Code Playgroud)
你知道一种直接获得的有效方法(所以没有任何行比较之后expand.grid)只提供所提供的矢量之间的"唯一"组合吗?输出将是
Var1 Var2
1 aa aa
2 ab aa
3 cc aa
5 ab ab
6 cc ab
9 cc cc
Run Code Online (Sandbox Code Playgroud)
编辑每个元素与自身的组合最终可以从答案中丢弃.我实际上并不需要它在我的程序中,即使(数学上)aa aa将是一个元素Var1和另一个元素之间的一个(常规)唯一组合var2.
解决方案需要从两个向量生成元素对(即每个输入向量中的一个 - 以便它可以应用于多于2个输入)
Sim*_*lon 28
怎么用outer?但是这个特殊的函数将它们连接成一个字符串.
outer( c("aa", "ab", "cc"), c("aa", "ab", "cc") , "paste" )
# [,1] [,2] [,3]
#[1,] "aa aa" "aa ab" "aa cc"
#[2,] "ab aa" "ab ab" "ab cc"
#[3,] "cc aa" "cc ab" "cc cc"
Run Code Online (Sandbox Code Playgroud)
combn如果您不想要重复元素,也可以使用两个向量的唯一元素(例如aa aa)
vals <- c( c("aa", "ab", "cc"), c("aa", "ab", "cc") )
vals <- unique( vals )
combn( vals , 2 )
# [,1] [,2] [,3]
#[1,] "aa" "aa" "ab"
#[2,] "ab" "cc" "cc"
Run Code Online (Sandbox Code Playgroud)
Fer*_*aft 13
在基数R中,您可以使用:
expand.grid.unique <- function(x, y, include.equals=FALSE)
{
x <- unique(x)
y <- unique(y)
g <- function(i)
{
z <- setdiff(y, x[seq_len(i-include.equals)])
if(length(z)) cbind(x[i], z, deparse.level=0)
}
do.call(rbind, lapply(seq_along(x), g))
}
Run Code Online (Sandbox Code Playgroud)
结果:
> x <- c("aa", "ab", "cc")
> y <- c("aa", "ab", "cc")
> expand.grid.unique(x, y)
[,1] [,2]
[1,] "aa" "ab"
[2,] "aa" "cc"
[3,] "ab" "cc"
> expand.grid.unique(x, y, include.equals=TRUE)
[,1] [,2]
[1,] "aa" "aa"
[2,] "aa" "ab"
[3,] "aa" "cc"
[4,] "ab" "ab"
[5,] "ab" "cc"
[6,] "cc" "cc"
Run Code Online (Sandbox Code Playgroud)
Ben*_*nes 12
如果两个向量相同,则包中有combinations函数gtools:
library(gtools)
combinations(n = 3, r = 2, v = c("aa", "ab", "cc"), repeats.allowed = TRUE)
# [,1] [,2]
# [1,] "aa" "aa"
# [2,] "aa" "ab"
# [3,] "aa" "cc"
# [4,] "ab" "ab"
# [5,] "ab" "cc"
# [6,] "cc" "cc"
Run Code Online (Sandbox Code Playgroud)
没有"aa" "aa"等等
combinations(n = 3, r = 2, v = c("aa", "ab", "cc"), repeats.allowed = FALSE)
Run Code Online (Sandbox Code Playgroud)
小智 8
尝试:
factors <- c("a", "b", "c")
all.combos <- t(combn(factors,2))
[,1] [,2]
[1,] "a" "b"
[2,] "a" "c"
[3,] "b" "c"
Run Code Online (Sandbox Code Playgroud)
这将不包括每个因素的重复项(例如“ a”,“ a”),但是如果需要,您可以轻松地添加这些因素。
dup.combos <- cbind(factors,factors)
factors factors
[1,] "a" "a"
[2,] "b" "b"
[3,] "c" "c"
all.combos <- rbind(all.combos,dup.combos)
factors factors
[1,] "a" "b"
[2,] "a" "c"
[3,] "b" "c"
[4,] "a" "a"
[5,] "b" "b"
[6,] "c" "c"
Run Code Online (Sandbox Code Playgroud)
之前的答案缺乏获得特定结果的方法,即保持自我配对但删除具有不同顺序的配对.该gtools包有一个用于这些目的两种功能,combinations和permutations.根据这个网站:
在这两种情况下,我们都决定是否允许重复,相应地,两个函数都有一个repeats.allowed参数,产生4种组合(美味的元!).值得仔细研究这些问题.为了便于理解,我将矢量简化为单个字母.
最广泛的选择是允许自我关系和不同的有序选项:
> permutations(n = 3, r = 2, repeats.allowed = T, v = c("a", "b", "c"))
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "a"
[5,] "b" "b"
[6,] "b" "c"
[7,] "c" "a"
[8,] "c" "b"
[9,] "c" "c"
Run Code Online (Sandbox Code Playgroud)
这给了我们9个选择.这个值可以从简单的公式中找到,n^r即3^2=9.这是熟悉SQL的用户的笛卡尔积/联接.
有两种方法可以限制:1)删除自我关系(禁止重复),或2)删除不同排序的选项(即组合).
如果我们想要删除不同排序的选项,我们使用:
> combinations(n = 3, r = 2, repeats.allowed = T, v = c("a", "b", "c"))
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "b"
[5,] "b" "c"
[6,] "c" "c"
Run Code Online (Sandbox Code Playgroud)
这给了我们6个选择.该值的公式为(r+n-1)!/(r!*(n-1)!)ie (2+3-1)!/(2!*(3-1)!)=4!/(2*2!)=24/4=6.
如果我们想要禁止重复,我们使用:
> permutations(n = 3, r = 2, repeats.allowed = F, v = c("a", "b", "c"))
[,1] [,2]
[1,] "a" "b"
[2,] "a" "c"
[3,] "b" "a"
[4,] "b" "c"
[5,] "c" "a"
[6,] "c" "b"
Run Code Online (Sandbox Code Playgroud)
这也为我们提供了6种选择,但不同的选择!期权数量与上述相同,但这是巧合.该值可以从公式可以找到n!/(n-r)!即(3*2*1)/(3-2)!=6/1!=6.
最大的限制是当我们既不需要自我关系/重复或不同的有序选项时,在这种情况下我们使用:
> combinations(n = 3, r = 2, repeats.allowed = F, v = c("a", "b", "c"))
[,1] [,2]
[1,] "a" "b"
[2,] "a" "c"
[3,] "b" "c"
Run Code Online (Sandbox Code Playgroud)
这给了我们3个选项.的选项的数目可以从相当复杂的公式来计算n!/(r!(n-r)!),即3*2*1/(2*1*(3-2)!)=6/(2*1!)=6/2=3.
您可以使用“大于”运算来过滤冗余组合。这适用于数字向量和字符向量。
> grid <- expand.grid(c("aa", "ab", "cc"), c("aa", "ab", "cc"), stringsAsFactors = F)
> grid[grid$Var1 >= grid$Var2, ]
Var1 Var2
1 aa aa
2 ab aa
3 cc aa
5 ab ab
6 cc ab
9 cc cc
Run Code Online (Sandbox Code Playgroud)
这不会使您的代码速度减慢太多。如果您要扩展包含较大元素的向量(例如两个数据帧列表),我建议使用引用原始向量的数字索引。
| 归档时间: |
|
| 查看次数: |
10046 次 |
| 最近记录: |