我有一个包含以下示例结构的列表:
> dput(test)
structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(
var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2",
"var3")), section2 = structure(list(row = structure(list(var1 = 1,
var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")),
row = structure(list(var1 = 4, var2 = 5, var3 = 6), .Names = c("var1",
"var2", "var3")), row = structure(list(var1 = 7, var2 = 8,
var3 = 9), .Names = c("var1", "var2", "var3"))), .Names = c("row",
"row", "row"))), .Names = c("id", "var1", "var3", "section1",
"section2"))
> str(test)
List of 5
$ id : num 1
$ var1 : num 2
$ var3 : num 4
$ section1:List of 3
..$ var1: num 1
..$ var2: num 2
..$ var3: num 3
$ section2:List of 3
..$ row:List of 3
.. ..$ var1: num 1
.. ..$ var2: num 2
.. ..$ var3: num 3
..$ row:List of 3
.. ..$ var1: num 4
.. ..$ var2: num 5
.. ..$ var3: num 6
..$ row:List of 3
.. ..$ var1: num 7
.. ..$ var2: num 8
.. ..$ var3: num 9
Run Code Online (Sandbox Code Playgroud)
请注意,该section2
列表包含名为的元素rows
.这些代表多个记录.我所拥有的是嵌套列表,其中一些元素位于根级别,而其他元素是同一观察的多个嵌套记录.我希望以下data.frame
格式的输出:
> desired
id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3
1 1 2 4 1 2 3 1 4 7
2 NA NA NA NA NA NA 2 5 8
3 NA NA NA NA NA NA 3 6 9
Run Code Online (Sandbox Code Playgroud)
根级元素应填充第一行,而row
元素应具有自己的行.作为一个额外的复杂因素,row
条目中的变量数量可能会有所不同.
这是一般方法。它并不假设您只有三行;无论您有多少行,它都可以使用。如果嵌套结构中缺少某个值(例如,第 2 节中的某些子列表不存在 var1),则代码会正确返回该单元格的 NA。
例如,如果我们使用以下数据:
test <- structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")), section2 = structure(list(row = structure(list(var1 = 1, var2 = 2), .Names = c("var1", "var2")), row = structure(list(var1 = 4, var2 = 5), .Names = c("var1", "var2")), row = structure(list( var2 = 8, var3 = 9), .Names = c("var2", "var3"))), .Names = c("row", "row", "row"))), .Names = c("id", "var1", "var3", "section1", "section2"))
Run Code Online (Sandbox Code Playgroud)
一般方法是使用 Melt 创建一个数据帧,其中包含有关嵌套结构的信息,然后 dcast 将其塑造成您想要的格式。
library("reshape2")
flat <- unlist(test, recursive=FALSE)
names(flat)[grep("row", names(flat))] <- gsub("row", "var", paste0(names(flat)[grep("row", names(flat))], seq_len(length(names(flat)[grep("row", names(flat))])))) ## keeps track of rows by adding an ID
ul <- melt(unlist(flat))
split <- strsplit(rownames(ul), split=".", fixed=TRUE) ## splits the names into component parts
max <- max(unlist(lapply(split, FUN=length)))
pad <- function(a) {
c(a, rep(NA, max-length(a)))
}
levels <- matrix(unlist(lapply(split, FUN=pad)), ncol=max, byrow=TRUE)
## Get the nesting structure
nested <- data.frame(levels, ul)
nested$X3[is.na(nested$X3)] <- levels(as.factor(nested$X3))[[1]]
desired <- dcast(nested, X3~X1 + X2)
names(desired) <- gsub("_", "\\.", gsub("_NA", "", names(desired)))
desired <- desired[,names(flat)]
> desired
## id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3
## 1 1 2 4 1 2 3 1 4 7
## 2 NA NA NA NA NA NA 2 5 8
## 3 NA NA NA NA NA NA 3 6 9
Run Code Online (Sandbox Code Playgroud)