重塑在中间

Ari*_*man 5 r reshape

作为试点调查的一部分,我向每个Turker展示了四种选择中的各种选择.数据如下所示:

> so
  WorkerId pio_1_1 pio_1_2 pio_1_3 pio_1_4 pio_2_1 pio_2_2 pio_2_3 pio_2_4
1        1     Yes      No      No      No      No      No     Yes      No
2        2      No     Yes      No      No     Yes      No     Yes      No
3        3     Yes     Yes      No      No     Yes      No     Yes      No
Run Code Online (Sandbox Code Playgroud)

我希望它看起来像这样:

WorkerId set pio1 pio2 pio3 pio4
       1   1  Yes   No   No   No
       1   2   No   No  Yes   No
...
Run Code Online (Sandbox Code Playgroud)

我可以通过一些方法来解决这个问题,其中没有一个看起来非常优雅:

  • 使用正则表达式和反向引用交换数字的顺序,然后使用reshape()
  • 编写我自己的小函数来解析下划线之间的第一个数字,然后重新整形
  • 拆分然后堆叠列(依赖于顺序正确)

但在我看来,所有这些都忽略了这样一种观点,即你所谓的"双宽"格式的数据有其自己的结构.我很乐意使用reshape2包,但是尽管使用cast()生成了数据,但我没有看到任何可以帮助我真正融化这个data.frame的选项.

建议欢迎.

so <- structure(list(WorkerId = 1:3, pio_1_1 = structure(c(2L, 1L, 
2L), .Label = c("No", "Yes"), class = "factor"), pio_1_2 = structure(c(1L, 
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_1_3 = structure(c(1L, 
1L, 1L), .Label = c("No", "Yes"), class = "factor"), pio_1_4 = structure(c(1L, 
1L, 1L), .Label = "No", class = "factor"), pio_2_1 = structure(c(1L, 
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_2_2 = structure(c(1L, 
1L, 1L), .Label = c("No", "Yes"), class = "factor"), pio_2_3 = structure(c(2L, 
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_2_4 = structure(c(1L, 
1L, 1L), .Label = "No", class = "factor")), .Names = c("WorkerId", 
"pio_1_1", "pio_1_2", "pio_1_3", "pio_1_4", "pio_2_1", "pio_2_2", 
"pio_2_3", "pio_2_4"), row.names = c(NA, 3L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

gau*_*den 3

我不确定这是否太明显,但就这样吧。它应该是不言自明的。传入您的so数据框,它会返回重新整形的数据。

library("reshape2")

reshape.middle <- function(dat) {
    dat <- melt(so, id="WorkerId")
    dat$set <- substr(dat$variable, 5,5)
    dat$name <- paste(substr(dat$variable, 1, 4),
                      substr(dat$variable, 7, 7),
                      sep="")
    dat$variable <- NULL

    dat <- melt(dat, id=c("WorkerId", "set", "name"))
    dat$variable <- NULL

    return(dcast(dat, WorkerId + set ~ name))
}

so # initial form
so <- reshape.middle(so)
so # as needed
Run Code Online (Sandbox Code Playgroud)

希望这可以帮助。