返回总和 >=0 的两种列类型的所有组合,并返回 R [R] 中哪些列的相应摘要元数据

Nea*_*sch 7 r dataframe purrr

我有这样的数据:

example_df <- data.frame(
  col1type1 =c(110:106),
  col2type2 = c(-108:-104),
  col3type1 = c(-109:-105), 
  col4type2 =c(110:106),
  col5type1 =c(107:103),
  col6type2 = c(-110:-106),
  col7type1 =c(109:113),
  col8type2 = c(-120:-116),
  col9type1 = c(-105:-101),
  col10type2 =c(105:101),
  col11type1 = c(-125:-121),
  col12type2 = c(-105:-101) 
)
Run Code Online (Sandbox Code Playgroud)

我只想返回同一行上 type1+type2>=0 的组合,并返回一个新的 df >=0 的组合、行和两个数字:(我知道我可以使用 for/foreach 来单独计算每个单元格并输出到 data.frame,但必须有更有效的方法)

所需的输出如下(不完整):

#for all possible combinations, like the example rows below
example_first <- data.frame(column_combination="col1type1_col2type2", row=1, sum=2,col1number=110,col2number=-108)
example_mid<- data.frame(column_combination="col1type1_col12type2",row=3, sum=5,col1number=108,col2number=-103)
example_last <- data.frame(column_combination="col9type1_col10type2",row=5,sum=0,col1number=-101,col2number=101)

#would want like this for all possible combinations
desired_incomplete_output <- rbind(example_first,example_mid,example_last) 

Run Code Online (Sandbox Code Playgroud)

什么是有效的方法来集体计算而不是残酷的 for/foreach 循环?谢谢!

Ani*_*yal 3

如果所需的完整输出包含 79 个结果,对于给定的示例,您可以执行类似的操作。

\n

步骤说明-

\n
    \n
  1. 通过前两行,mutatesplit我们将数据分成单独的行,每个行都有自己的数据帧,即分成一个列表。
  2. \n
  3. 为了使用这个列表,我使用了purrr::imap_dfr它基本上将列表作为输入并输出data.frame绑定所有结果的后行。在每个子步骤中,我都完成了 -\n
      \n
    • 首先取消选择行列
    • \n
    • 旋转的一切
    • \n
    • 然后name使用将包含输入数据的所有列名称的列分隔成两个单独的列tidyr::separate
    • \n
    • 然后使用num1和 的组合创建叉积num2purrr::cross2
    • \n
    • 然后再次使用map_dfr将该叉积转换为数据框
    • \n
    • 然后使用分隔符分隔列名称和值。我使用了一个seapartor @,我认为它没有在列名中使用
    • \n
    • 之后过滤行
    • \n
    • dplyr使用动词的其他基本数据整理/转换
    • \n
    \n
  4. \n
\n
library(tidyverse)\n\nexample_df %>% \n  mutate(row = row_number()) %>% \n  split(.$row) %>% \n  imap_dfr(\\(.a, .b) .a %>% \n        select(-row) %>% \n        pivot_longer(everything()) %>% \n        separate(name, into = c(\'col\', \'type\'), sep = \'(?:type)\') %>% \n        {cross2(paste(.$col[.$type == \'1\'], .$value[.$type == \'1\'], sep = "@"), \n                paste(.$col[.$type == \'2\'], .$value[.$type == \'2\'], sep = "@"))} %>% \n        map_dfr(~ set_names(.x, c(\'x\', \'y\'))) %>% \n        separate(x, into = c(\'col1\', \'type1\'), convert = TRUE, sep = \'@\') %>% \n        separate(y, into = c(\'col2\', \'type2\'), convert = TRUE, sep = "@") %>% \n        filter(type1 + type2 >= 0) %>% \n        mutate(col_comb = paste0(col1, \'type1_\', col2, "type2"),\n               sum= type1 + type2) %>% \n        rename(col1number = type1,\n               col2number = type2) %>% \n        select(-col1, -col2) %>% \n        mutate(row = .b))\n#> # A tibble: 79 \xc3\x97 5\n#>    col1number col2number col_comb               sum row  \n#>         <int>      <int> <chr>                <int> <chr>\n#>  1        110       -108 col1type1_col2type2      2 1    \n#>  2        109       -108 col7type1_col2type2      1 1    \n#>  3        110        110 col1type1_col4type2    220 1    \n#>  4       -109        110 col3type1_col4type2      1 1    \n#>  5        107        110 col5type1_col4type2    217 1    \n#>  6        109        110 col7type1_col4type2    219 1    \n#>  7       -105        110 col9type1_col4type2      5 1    \n#>  8        110       -110 col1type1_col6type2      0 1    \n#>  9        110        105 col1type1_col10type2   215 1    \n#> 10        107        105 col5type1_col10type2   212 1    \n#> # \xe2\x80\xa6 with 69 more rows\n
Run Code Online (Sandbox Code Playgroud)\n

如果你的列被命名为anum1, anum2, bnum1...,我们可以稍微修改一下(实际上 3 个步骤,全部标记为注释)

\n
example_df %>% \n  mutate(row = row_number()) %>% \n  split(.$row) %>% \n  imap_dfr(\\(.a, .b) .a %>% \n             select(-row) %>% \n             pivot_longer(everything()) %>% \n             separate(name, into = c(\'col\', \'type\'), sep = \'(?:num)\') %>% # change sep\n             {cross2(paste(.$col[.$type == \'1\'], .$value[.$type == \'1\'], sep = "@"), \n                     paste(.$col[.$type == \'2\'], .$value[.$type == \'2\'], sep = "@"))} %>% \n             map_dfr(~ set_names(.x, c(\'x\', \'y\'))) %>% \n             separate(x, into = c(\'col1\', \'type1\'), convert = TRUE, sep = \'@\') %>% \n             separate(y, into = c(\'col2\', \'type2\'), convert = TRUE, sep = "@") %>% \n             filter(type1 + type2 >= 0) %>% \n             mutate(col_comb = paste0(col1, \'type1_\', col2, "type2"),\n                    sum= type1 + type2) %>% \n             rename(col1number = num1,      # change prefix\n                    col2number = num2) %>%  # change prefix\n             select(-col1, -col2) %>% \n             mutate(row = .b))\n
Run Code Online (Sandbox Code Playgroud)\n