pgi*_*tti 1 r list dataframe dplyr tidyverse
我有一个列表,代表单个出版物的研究信息领域。我想将列表合并到 data.frame 中,以便每个 2 位代码存储在“部门”列中,每个 4 位代码存储在“组”列中。当前两位数字共享时,部门和组应存储在同一行。我为这个不好的标题道歉。
my_list <- list(
list(id = "80067", name = "3403 Macromolecular and Materials Chemistry"),
list(id = "80011", name = "40 Engineering"),
list(id = "80005", name = "34 Chemical Sciences")
)
Run Code Online (Sandbox Code Playgroud)
期望的输出:
data.frame(division = c("40 Engineering", "34 Chemical Sciences"),
group = c(NA, "3403 Macromolecular and Materials Chemistry"))
Run Code Online (Sandbox Code Playgroud)
首先unlist你my_list进入一个向量,然后enframe它进入一个两列的数据框。filter仅包含name列,然后按数字模式分配group和(用于分组到同一行)。prefix最后将结构从“长”重塑为“宽”。
library(tidyverse)\n\nunlist(my_list) %>% \n enframe() %>% \n filter(name == "name") %>% \n mutate(group = ifelse(str_count(value, "\\\\d") == 4, "group", "division"), \n prefix = str_extract(value, "^\\\\d{2}"), .keep = "used") %>% \n pivot_wider(names_from = group, values_from = value)\nRun Code Online (Sandbox Code Playgroud)\nbind_rows更新:如果我们在开头使用(受到@akrun\'s 答案的启发),上面的代码可以稍微简化:
bind_rows(my_list) %>% \n mutate(group = ifelse(str_count(name, "\\\\d") == 4, "group", "division"), \n prefix = str_extract(name, "^\\\\d{2}"), .keep = "used") %>% \n pivot_wider(names_from = group, values_from = name)\nRun Code Online (Sandbox Code Playgroud)\n# A tibble: 2 \xc3\x97 3\n prefix group division \n <chr> <chr> <chr> \n1 34 3403 Macromolecular and Materials Chemistry 34 Chemical Sciences\n2 40 NA 40 Engineering 40 Engineering \nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
58 次 |
| 最近记录: |