将列表转换为数据框,相关数据点位于同一行

pgi*_*tti 1 r list dataframe dplyr tidyverse

我有一个列表,代表单个出版物的研究信息领域。我想将列表合并到 data.frame 中,以便每个 2 位代码存储在“部门”列中,每个 4 位代码存储在“组”列中。当前两位数字共享时,部门和组应存储在同一行。我为这个不好的标题道歉。

my_list <- list(
  list(id = "80067", name = "3403 Macromolecular and Materials Chemistry"),
  list(id = "80011", name = "40 Engineering"),
  list(id = "80005", name = "34 Chemical Sciences")
)
Run Code Online (Sandbox Code Playgroud)

期望的输出:

data.frame(division = c("40 Engineering", "34 Chemical Sciences"), 
           group = c(NA, "3403 Macromolecular and Materials Chemistry"))
Run Code Online (Sandbox Code Playgroud)

ben*_*n23 5

首先unlistmy_list进入一个向量,然后enframe它进入一个两列的数据框。filter仅包含name列,然后按数字模式分配group和(用于分组到同一行)。prefix最后将结构从“长”重塑为“宽”。

\n
library(tidyverse)\n\nunlist(my_list) %>% \n  enframe() %>% \n  filter(name == "name") %>% \n  mutate(group = ifelse(str_count(value, "\\\\d") == 4, "group", "division"), \n         prefix = str_extract(value, "^\\\\d{2}"), .keep = "used") %>% \n  pivot_wider(names_from = group, values_from = value)\n
Run Code Online (Sandbox Code Playgroud)\n
\n

bind_rows更新:如果我们在开头使用(受到@akrun\'s 答案的启发),上面的代码可以稍微简化:

\n
bind_rows(my_list) %>% \n  mutate(group = ifelse(str_count(name, "\\\\d") == 4, "group", "division"), \n         prefix = str_extract(name, "^\\\\d{2}"), .keep = "used") %>% \n  pivot_wider(names_from = group, values_from = name)\n
Run Code Online (Sandbox Code Playgroud)\n
\n

输出

\n
# A tibble: 2 \xc3\x97 3\n  prefix group                                       division            \n  <chr>  <chr>                                       <chr>               \n1 34     3403 Macromolecular and Materials Chemistry 34 Chemical Sciences\n2 40     NA                                          40 Engineering                                        40 Engineering \n
Run Code Online (Sandbox Code Playgroud)\n