tidyr::expand() returns all possible combinations of values from multiple columns. I'm looking for a slightly different behavior, where all the values are in a single column and the combinations are to be taken across groups.
For example, let the data be defined as follows:
library( tidyverse )
X <- bind_rows( data_frame(Group = "Group1", Value = LETTERS[1:3]),
data_frame(Group = "Group2", Value = letters[4:5]) )
Run Code Online (Sandbox Code Playgroud)
We want all combinations of values from Group1 with values from Group2. My current clunky solution is to separate the values across multiple columns
Y <- X %>% group_by(Group) %>% do(vals = .$Value) %>% spread(Group, vals)
# # A tibble: 1 x 2
# Group1 Group2
# <list> <list>
# 1 <chr [3]> <chr [2]>
Run Code Online (Sandbox Code Playgroud)
followed by a double unnest operation
Y %>% unnest( .preserve = Group2 ) %>% unnest
# # A tibble: 6 x 2
# Group1 Group2
# <chr> <chr>
# 1 A d
# 2 A e
# 3 B d
# 4 B e
# 5 C d
# 6 C e
Run Code Online (Sandbox Code Playgroud)
This is the desired output, but as you can imagine, this solution doesn't generalize well: as the number of groups increases, so does the number of unnest operations that we have to perform.
Is there a more elegant solution?
因为OP似乎很乐意使用base,所以我将我的评论升级为答案:
expand.grid(split(X$Value, X$Group))
# Group1 Group2
# 1 A d
# 2 B d
# 3 C d
# 4 A e
# 5 B e
# 6 C e
Run Code Online (Sandbox Code Playgroud)
正如 OP 所指出的,expand.grid将字符向量转换为因子。为了防止这种情况,请使用stringsAsFactors = FALSE.
等价tidyverse的是purrr::cross_df,它不会强制分解:
cross_df(split(X$Value, X$Group))
# A tibble: 6 x 2
# Group1 Group2
# <chr> <chr>
# 1 A d
# 2 B d
# 3 C d
# 4 A e
# 5 B e
# 6 C e
Run Code Online (Sandbox Code Playgroud)