如何交换R中的列和行条目

Adr*_*ian 17 r data.table data-wrangling

library(data.table)
dat1 <- data.table(id = c(1, 2, 34, 99),
           class = c("sports", "", "music, sports", ""),
           hobby = c("knitting, music, sports", "", "", "music"))
> dat1
  id         class                   hobby
1  1        sports knitting, music, sports
2  2                                      
3 34 music, sports                        
4 99                                 music
Run Code Online (Sandbox Code Playgroud)

我有上面的数据集 ,dat1其中每一行对应一个唯一的id. 对于每个,或id的多个输入以逗号分隔。classhobby

我想交换此数据集的行和列,以便得到以下内容:

     input class hobby
1   sports 1, 34     1
2 knitting           1
3    music    34 1, 99
Run Code Online (Sandbox Code Playgroud)

在此数据集中,每一行对应一个唯一的inputfrom dat1。现在classhobby列存储id来自 的相应 s dat1,每个都用逗号分隔。

R中有没有像这样快速交换行和列的方法?

mni*_*ist 9

这是一个data.table解决方案

输入

library(data.table)
dat1 <- data.table(id = c(1, 2, 34, 99),
                   class = c("sports", "", "music, sports", ""),
                   hobby = c("knitting, music, sports", "", "", "music"))
dat1
#>    id         class                   hobby
#> 1:  1        sports knitting, music, sports
#> 2:  2                                      
#> 3: 34 music, sports                        
#> 4: 99                                 music
Run Code Online (Sandbox Code Playgroud)

数据准备

# in long format
dt_melted <- melt.data.table(dat1, id.vars = "id", variable.name = "type", value.name = "value")
dt_melted
#>    id  type                   value
#> 1:  1 class                  sports
#> 2:  2 class                        
#> 3: 34 class           music, sports
#> 4: 99 class                        
#> 5:  1 hobby knitting, music, sports
#> 6:  2 hobby                        
#> 7: 34 hobby                        
#> 8: 99 hobby                   music

# split values by comma
dt_splitted <- dt_melted[, .(input = unlist(data.table::tstrsplit(value, ","))), by = .(id, type)]
dt_splitted
#>    id  type    input
#> 1:  1 class   sports
#> 2: 34 class    music
#> 3: 34 class   sports
#> 4:  1 hobby knitting
#> 5:  1 hobby    music
#> 6:  1 hobby   sports
#> 7: 99 hobby    music
Run Code Online (Sandbox Code Playgroud)

最后一步 1

# bring back to desired wide format
dt_casted <- dcast.data.table(dt_splitted, 
                              formula = "input ~ type",
                              value.var = "id",
                              fun.aggregate = paste, 
                              collapse = ", ")
dt_casted
#>       input class hobby
#> 1: knitting           1
#> 2:    music    34 1, 99
#> 3:   sports 1, 34     1
Run Code Online (Sandbox Code Playgroud)

最后一步 2(更详细)

# combine ids by class/hobby
dt_splitted[, .(class = paste(id[type == "class"], collapse = ", "),
                hobby = paste(id[type == "hobby"], collapse = ", ")),
            by = .(input = trimws(input))]
#>       input class hobby
#> 1:   sports 1, 34     1
#> 2:    music    34 1, 99
#> 3: knitting           1
Run Code Online (Sandbox Code Playgroud)


Tho*_*ing 7

另一种data.table选择是使用dcast+melt

dcast(
  melt(dat1[, lapply(.SD, strsplit, ", "), id], "id")[
    ,
    .(input = unlist(value)),
    .(id, variable)
  ], input ~ variable,
  value.var = "id",
  fun = toString
)
Run Code Online (Sandbox Code Playgroud)

这使

      input class hobby
1: knitting           1
2:    music    34 1, 99
3:   sports 1, 34     1
Run Code Online (Sandbox Code Playgroud)


Maë*_*aël 5

这是一个快速tidyverse方法:

library(dplyr)
library(tidyr)
dat1 %>% 
  pivot_longer(-id, values_to = "input") %>%
  separate_rows(input) %>% 
  filter(input != "") %>% 
  pivot_wider(names_from = "name", values_from = "id", values_fn = toString)
Run Code Online (Sandbox Code Playgroud)
  input    class hobby
1 sports   1, 34 1    
2 knitting NA    1    
3 music    34    1, 99
Run Code Online (Sandbox Code Playgroud)