将宽格式转为长格式,然后嵌套列

Emm*_*man 9 r tidyr tibble

我得到了宽格式的数据。每行都与当前表外部的变量有关,以及与该变量相关的可能值。我正在尝试:(1)转为长格式,以及(2)嵌套转置值。

例子

library(tibble)

df_1 <-
  tribble(~key, ~values.male, ~values.female, ~values.red, ~values.green, ~value,
        "gender", 0.5, 0.5, NA, NA, NA,
        "age", NA, NA, NA, NA, "50",
        "color", NA, NA, TRUE, FALSE, NA,
        "time_of_day", NA, NA, NA, NA, "noon")

## # A tibble: 4 x 6
##   key         values.male values.female values.red values.green value
##   <chr>             <dbl>         <dbl> <lgl>      <lgl>        <chr>
## 1 gender              0.5           0.5 NA         NA           NA   
## 2 age                NA            NA   NA         NA           50   
## 3 color              NA            NA   TRUE       FALSE        NA   
## 4 time_of_day        NA            NA   NA         NA           noon 
Run Code Online (Sandbox Code Playgroud)

在这个例子中,我们看到gender可以有female = 0.5male = 0.5。另一方面,age只能有一个值50。从第 3 行我们了解到color可以具有red = TRUEgreen = FALSE、 和 的值time_of_day = noon

因此,透视表应采用以下嵌套形式:

my_pivoted_df <-
  structure(
    list(
      var_name = c("gender", "age", "color", "time_of_day"),
      vals = list(
        structure(
          list(
            level = c("male", "female"),
            value = c(0.5,
                      0.5)
          ),
          row.names = c(NA, -2L),
          class = c("tbl_df", "tbl", "data.frame")
        ),
        "50",
        structure(
          list(
            level = c("red", "green"),
            value = c(TRUE,
                      FALSE)
          ),
          row.names = c(NA, -2L),
          class = c("tbl_df", "tbl", "data.frame")
        ),
        "noon"
      )
    ),
    row.names = c(NA, -4L),
    class = c("tbl_df", "tbl",
              "data.frame")
  )


## # A tibble: 4 x 2
##   var_name    vals            
##   <chr>       <list>          
## 1 gender      <tibble [2 x 2]>
## 2 age         <chr [1]>       
## 3 color       <tibble [2 x 2]>
## 4 time_of_day <chr [1]>
Run Code Online (Sandbox Code Playgroud)

我试图解决这个问题

有几个问题df_1。首先,当前的列命名不方便。诸如此类的标头value并不理想,因为它们与pivot_longer()".value"机制相冲突。其次,当有多个选项(例如,“红色”和“绿色”表示)时, df_1has values(复数形式),但是(单数)当 的选项只有一个时(例如 with )。以下是我未成功的代码,受此答案启发。keycolorvaluekeyage

library(tidyr)
library(dplyr)

df_1 %>%
  rename_with( ~ paste(.x, "single", sep = "."), .cols = value) %>% ## changed the header because otherwise it breaks
  pivot_longer(cols = starts_with("val"),
               names_to = c("whatevs", ".value"), names_sep = "\\.")


## # A tibble: 8 x 7
##   key         whatevs  male female red   green single
##   <chr>       <chr>   <dbl>  <dbl> <lgl> <lgl> <chr> 
## 1 gender      values    0.5    0.5 NA    NA    NA    
## 2 gender      value    NA     NA   NA    NA    NA    
## 3 age         values   NA     NA   NA    NA    NA    
## 4 age         value    NA     NA   NA    NA    50    
## 5 color       values   NA     NA   TRUE  FALSE NA    
## 6 color       value    NA     NA   NA    NA    NA    
## 7 time_of_day values   NA     NA   NA    NA    NA    
## 8 time_of_day value    NA     NA   NA    NA    noon  
Run Code Online (Sandbox Code Playgroud)

我缺乏一些争论的技巧来解决这个问题。

ste*_*fan 4

实现您想要的结果的 tidyverse 方法可能如下所示:

\n\n
library(tibble)\n\ndf_1 <-\n  tribble(~key, ~values.male, ~values.female, ~values.red, ~values.green, ~value,\n          "gender", 0.5, 0.5, NA, NA, NA,\n          "age", NA, NA, NA, NA, "50",\n          "color", NA, NA, TRUE, FALSE, NA,\n          "time_of_day", NA, NA, NA, NA, "noon")\n\nlibrary(tidyr)\nlibrary(dplyr)\nlibrary(purrr)\n\ndf_pivoted <- df_1 %>% \n  mutate(across(everything(), as.character)) %>% \n  pivot_longer(-key, names_to = "level", names_prefix = "^values\\\\.", values_drop_na = TRUE) %>% \n  group_by(key) %>% \n  nest() %>% \n  mutate(data = map(data, ~ if (all(.x$level == "value")) deframe(.x) else .x))\ndf_pivoted\n#> # A tibble: 4 x 2\n#> # Groups:   key [4]\n#>   key         data            \n#>   <chr>       <list>          \n#> 1 gender      <tibble [2 \xc3\x97 2]>\n#> 2 age         <chr [1]>       \n#> 3 color       <tibble [2 \xc3\x97 2]>\n#> 4 time_of_day <chr [1]>\n
Run Code Online (Sandbox Code Playgroud)\n

编辑在您对所需结果的评论中进行澄清后,我们可以简单地删除作为结尾的映射语句(这基本上是为了将没有级别的类别的 tibbles 转换为向量),并在嵌套之前添加一个 mutate 语句以替换对于没有 的类别,级别为 NA level

\n\n
pivot_nest <- function(x) {\n  mutate(x, across(everything(), as.character)) %>% \n    pivot_longer(-key, names_to = "level", names_prefix = "^values\\\\.", values_drop_na = TRUE) %>% \n    group_by(key) %>% \n    mutate(level = ifelse(all(level == "value"), NA_character_, level)) %>% \n    nest() \n}\n\ndf_pivoted <- df_1 %>% \n  pivot_nest()\ndf_pivoted\n#> # A tibble: 4 x 2\n#> # Groups:   key [4]\n#>   key         data            \n#>   <chr>       <list>          \n#> 1 gender      <tibble [2 \xc3\x97 2]>\n#> 2 age         <tibble [1 \xc3\x97 2]>\n#> 3 color       <tibble [2 \xc3\x97 2]>\n#> 4 time_of_day <tibble [1 \xc3\x97 2]>\ndf_pivoted$data\n#> [[1]]\n#> # A tibble: 2 x 2\n#>   level value\n#>   <chr> <chr>\n#> 1 male  0.5  \n#> 2 male  0.5  \n#> \n#> [[2]]\n#> # A tibble: 1 x 2\n#>   level value\n#>   <chr> <chr>\n#> 1 <NA>  50   \n#> \n#> [[3]]\n#> # A tibble: 2 x 2\n#>   level value\n#>   <chr> <chr>\n#> 1 red   TRUE \n#> 2 red   FALSE\n#> \n#> [[4]]\n#> # A tibble: 1 x 2\n#>   level value\n#>   <chr> <chr>\n#> 1 <NA>  noon\n\ndf_2 <- tribble(~key, ~value, "age", "50", "income", "100000", "time_of_day", "noon")\n\ndf_pivoted2 <- df_2 %>% \n  pivot_nest()\ndf_pivoted2\n#> # A tibble: 3 x 2\n#> # Groups:   key [3]\n#>   key         data            \n#>   <chr>       <list>          \n#> 1 age         <tibble [1 \xc3\x97 2]>\n#> 2 income      <tibble [1 \xc3\x97 2]>\n#> 3 time_of_day <tibble [1 \xc3\x97 2]>\ndf_pivoted2$data\n#> [[1]]\n#> # A tibble: 1 x 2\n#>   level value\n#>   <chr> <chr>\n#> 1 <NA>  50   \n#> \n#> [[2]]\n#> # A tibble: 1 x 2\n#>   level value \n#>   <chr> <chr> \n#> 1 <NA>  100000\n#> \n#> [[3]]\n#> # A tibble: 1 x 2\n#>   level value\n#>   <chr> <chr>\n#> 1 <NA>  noon\n
Run Code Online (Sandbox Code Playgroud)\n