Adr*_*ian 17 r data.table data-wrangling
library(data.table)
dat1 <- data.table(id = c(1, 2, 34, 99),
class = c("sports", "", "music, sports", ""),
hobby = c("knitting, music, sports", "", "", "music"))
> dat1
id class hobby
1 1 sports knitting, music, sports
2 2
3 34 music, sports
4 99 music
Run Code Online (Sandbox Code Playgroud)
我有上面的数据集 ,dat1
其中每一行对应一个唯一的id
. 对于每个,或id
的多个输入以逗号分隔。class
hobby
我想交换此数据集的行和列,以便得到以下内容:
input class hobby
1 sports 1, 34 1
2 knitting 1
3 music 34 1, 99
Run Code Online (Sandbox Code Playgroud)
在此数据集中,每一行对应一个唯一的input
from dat1
。现在class
和hobby
列存储id
来自 的相应 s dat1
,每个都用逗号分隔。
R中有没有像这样快速交换行和列的方法?
这是一个data.table
解决方案
library(data.table)
dat1 <- data.table(id = c(1, 2, 34, 99),
class = c("sports", "", "music, sports", ""),
hobby = c("knitting, music, sports", "", "", "music"))
dat1
#> id class hobby
#> 1: 1 sports knitting, music, sports
#> 2: 2
#> 3: 34 music, sports
#> 4: 99 music
Run Code Online (Sandbox Code Playgroud)
# in long format
dt_melted <- melt.data.table(dat1, id.vars = "id", variable.name = "type", value.name = "value")
dt_melted
#> id type value
#> 1: 1 class sports
#> 2: 2 class
#> 3: 34 class music, sports
#> 4: 99 class
#> 5: 1 hobby knitting, music, sports
#> 6: 2 hobby
#> 7: 34 hobby
#> 8: 99 hobby music
# split values by comma
dt_splitted <- dt_melted[, .(input = unlist(data.table::tstrsplit(value, ","))), by = .(id, type)]
dt_splitted
#> id type input
#> 1: 1 class sports
#> 2: 34 class music
#> 3: 34 class sports
#> 4: 1 hobby knitting
#> 5: 1 hobby music
#> 6: 1 hobby sports
#> 7: 99 hobby music
Run Code Online (Sandbox Code Playgroud)
# bring back to desired wide format
dt_casted <- dcast.data.table(dt_splitted,
formula = "input ~ type",
value.var = "id",
fun.aggregate = paste,
collapse = ", ")
dt_casted
#> input class hobby
#> 1: knitting 1
#> 2: music 34 1, 99
#> 3: sports 1, 34 1
Run Code Online (Sandbox Code Playgroud)
# combine ids by class/hobby
dt_splitted[, .(class = paste(id[type == "class"], collapse = ", "),
hobby = paste(id[type == "hobby"], collapse = ", ")),
by = .(input = trimws(input))]
#> input class hobby
#> 1: sports 1, 34 1
#> 2: music 34 1, 99
#> 3: knitting 1
Run Code Online (Sandbox Code Playgroud)
另一种data.table
选择是使用dcast
+melt
dcast(
melt(dat1[, lapply(.SD, strsplit, ", "), id], "id")[
,
.(input = unlist(value)),
.(id, variable)
], input ~ variable,
value.var = "id",
fun = toString
)
Run Code Online (Sandbox Code Playgroud)
这使
input class hobby
1: knitting 1
2: music 34 1, 99
3: sports 1, 34 1
Run Code Online (Sandbox Code Playgroud)
这是一个快速tidyverse
方法:
library(dplyr)
library(tidyr)
dat1 %>%
pivot_longer(-id, values_to = "input") %>%
separate_rows(input) %>%
filter(input != "") %>%
pivot_wider(names_from = "name", values_from = "id", values_fn = toString)
Run Code Online (Sandbox Code Playgroud)
input class hobby
1 sports 1, 34 1
2 knitting NA 1
3 music 34 1, 99
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
671 次 |
最近记录: |