我有这样的数据:
example_df <- data.frame(
col1type1 =c(110:106),
col2type2 = c(-108:-104),
col3type1 = c(-109:-105),
col4type2 =c(110:106),
col5type1 =c(107:103),
col6type2 = c(-110:-106),
col7type1 =c(109:113),
col8type2 = c(-120:-116),
col9type1 = c(-105:-101),
col10type2 =c(105:101),
col11type1 = c(-125:-121),
col12type2 = c(-105:-101)
)
Run Code Online (Sandbox Code Playgroud)
我只想返回同一行上 type1+type2>=0 的组合,并返回一个新的 df >=0 的组合、行和两个数字:(我知道我可以使用 for/foreach 来单独计算每个单元格并输出到 data.frame,但必须有更有效的方法)
所需的输出如下(不完整):
#for all possible combinations, like the example rows below
example_first <- data.frame(column_combination="col1type1_col2type2", row=1, sum=2,col1number=110,col2number=-108)
example_mid<- data.frame(column_combination="col1type1_col12type2",row=3, sum=5,col1number=108,col2number=-103)
example_last <- data.frame(column_combination="col9type1_col10type2",row=5,sum=0,col1number=-101,col2number=101)
#would want like this for all possible combinations
desired_incomplete_output <- rbind(example_first,example_mid,example_last)
Run Code Online (Sandbox Code Playgroud)
什么是有效的方法来集体计算而不是残酷的 for/foreach 循环?谢谢!
如果所需的完整输出包含 79 个结果,对于给定的示例,您可以执行类似的操作。
\n步骤说明-
\nmutate
即split
我们将数据分成单独的行,每个行都有自己的数据帧,即分成一个列表。purrr::imap_dfr
它基本上将列表作为输入并输出data.frame
绑定所有结果的后行。在每个子步骤中,我都完成了 -\nname
使用将包含输入数据的所有列名称的列分隔成两个单独的列tidyr::separate
num1
和 的组合创建叉积num2
purrr::cross2
@
,我认为它没有在列名中使用dplyr
使用动词的其他基本数据整理/转换library(tidyverse)\n\nexample_df %>% \n mutate(row = row_number()) %>% \n split(.$row) %>% \n imap_dfr(\\(.a, .b) .a %>% \n select(-row) %>% \n pivot_longer(everything()) %>% \n separate(name, into = c(\'col\', \'type\'), sep = \'(?:type)\') %>% \n {cross2(paste(.$col[.$type == \'1\'], .$value[.$type == \'1\'], sep = "@"), \n paste(.$col[.$type == \'2\'], .$value[.$type == \'2\'], sep = "@"))} %>% \n map_dfr(~ set_names(.x, c(\'x\', \'y\'))) %>% \n separate(x, into = c(\'col1\', \'type1\'), convert = TRUE, sep = \'@\') %>% \n separate(y, into = c(\'col2\', \'type2\'), convert = TRUE, sep = "@") %>% \n filter(type1 + type2 >= 0) %>% \n mutate(col_comb = paste0(col1, \'type1_\', col2, "type2"),\n sum= type1 + type2) %>% \n rename(col1number = type1,\n col2number = type2) %>% \n select(-col1, -col2) %>% \n mutate(row = .b))\n#> # A tibble: 79 \xc3\x97 5\n#> col1number col2number col_comb sum row \n#> <int> <int> <chr> <int> <chr>\n#> 1 110 -108 col1type1_col2type2 2 1 \n#> 2 109 -108 col7type1_col2type2 1 1 \n#> 3 110 110 col1type1_col4type2 220 1 \n#> 4 -109 110 col3type1_col4type2 1 1 \n#> 5 107 110 col5type1_col4type2 217 1 \n#> 6 109 110 col7type1_col4type2 219 1 \n#> 7 -105 110 col9type1_col4type2 5 1 \n#> 8 110 -110 col1type1_col6type2 0 1 \n#> 9 110 105 col1type1_col10type2 215 1 \n#> 10 107 105 col5type1_col10type2 212 1 \n#> # \xe2\x80\xa6 with 69 more rows\n
Run Code Online (Sandbox Code Playgroud)\n如果你的列被命名为anum1
, anum2
, bnum1
...,我们可以稍微修改一下(实际上 3 个步骤,全部标记为注释)
example_df %>% \n mutate(row = row_number()) %>% \n split(.$row) %>% \n imap_dfr(\\(.a, .b) .a %>% \n select(-row) %>% \n pivot_longer(everything()) %>% \n separate(name, into = c(\'col\', \'type\'), sep = \'(?:num)\') %>% # change sep\n {cross2(paste(.$col[.$type == \'1\'], .$value[.$type == \'1\'], sep = "@"), \n paste(.$col[.$type == \'2\'], .$value[.$type == \'2\'], sep = "@"))} %>% \n map_dfr(~ set_names(.x, c(\'x\', \'y\'))) %>% \n separate(x, into = c(\'col1\', \'type1\'), convert = TRUE, sep = \'@\') %>% \n separate(y, into = c(\'col2\', \'type2\'), convert = TRUE, sep = "@") %>% \n filter(type1 + type2 >= 0) %>% \n mutate(col_comb = paste0(col1, \'type1_\', col2, "type2"),\n sum= type1 + type2) %>% \n rename(col1number = num1, # change prefix\n col2number = num2) %>% # change prefix\n select(-col1, -col2) %>% \n mutate(row = .b))\n
Run Code Online (Sandbox Code Playgroud)\n