我将dput
列表的底部放在底部,以便q可以重现。dput是a
not x
。
我有一个很大的嵌套列表x
,我试图从中构建数据框,但无法弄清楚。
我已经完成了第一部分:
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
indx <- lengths(a)
zz <- as.data.frame(do.call(rbind,lapply(a, `length<-`, max(indx))))}
Run Code Online (Sandbox Code Playgroud)
为此,我使用以下答案: 将嵌套列表(长度不等)转换为数据帧
这给我留下了一个带有n列的data.frame作为结果,其中n是任何i的最大结果:
v1 v2 v3
1 NULL NULL NULL
2 * * *
3 NULL NULL NULL
Run Code Online (Sandbox Code Playgroud)
每个*是另一个嵌套列表,格式为 list(experience = list(duration = ...
例如,第*
2行第v1列。我不要总数。我只想要:
a[[2]][[1]]$experience$start
Run Code Online (Sandbox Code Playgroud)
或就原始列表而言:
x$results[[2]]$experiences[[1]]$experience$start
Run Code Online (Sandbox Code Playgroud)
我觉得我快要调整了。我试过了:
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
indx <- lengths(a)
for(y in 1:length(a[[i]])) aa <- rbind(aa,tryCatch(x$results[[i]]$experiences[[y]]$experience$start, error=function(e) print(NA)))
zz <- as.data.frame(do.call(rbind,lapply(aa, `length<-`, max(indx))))}
Run Code Online (Sandbox Code Playgroud)
导致:
v1 v2 v3
1 NA NA NA
2 NA NA NA
3 2014 NA NA
4 2012 NA NA
5 2006 NA NA
6 NA NA NA
7 NA NA NA
Run Code Online (Sandbox Code Playgroud)
在最后一行尝试了cbind而不是rbind,并将所有日期都放在第一行。
我还尝试了以下方法:
for(i in 1:3){a[[i]]<-lengths(x$results[[i]]$experiences)
indx <- lengths(a)
for(y in 1:length(indx)){tt[i] <- tryCatch(x$results[[i]]$experiences[[y]]$experience$start, error=function(e) print(""))}
zz <- as.data.frame(do.call(rbind,lapply(tt, `length<-`, max(indx))))}
Run Code Online (Sandbox Code Playgroud)
这很接近,构建了正确的格式,但仅返回第一个结果:
v1 v2 v3
1 NA NA NA
2 2014 NA NA
3 NA NA NA
Run Code Online (Sandbox Code Playgroud)
我想要的格式是:
V1 V2 V3
1 NA NA NA
2 2014 2012 2006
3 NA NA NA
Run Code Online (Sandbox Code Playgroud)
((样本数据现在在底部)
最新尝试:
进行以下操作,但仅返回每个的第一个开始日期a[[i]]
,第二个循环,我需要使列表aa[i][y]
有所不同。
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
for(y in 1:length(a[[i]])){aa[i][y] = if(is.null(a[[i]][[y]]$experience$start)){"NULL"}else{a[[i]][[y]]$experience$start}}}
Run Code Online (Sandbox Code Playgroud)
因此,dput2
我想填写以下表格:
v1 v2 v3 v4 v5 v6 v7 v8
1 2015
2 2011 2007 null null null null null null
3 2016 2015 2015 2015 2013 2010
Run Code Online (Sandbox Code Playgroud)
我不在乎空白是否为null或na
更新
下面的答案几乎可以用,但是在我的数据中,结构发生了变化,名称的顺序(roleName,duration等)也发生了变化,从而破坏了答案,因为cumsum
它用来确定何时找到新列表。如果有,duration
则start
键为9
和1
,并且cumsum
零件将它们标记为两个不同的列表。
我写了以下内容:
my.list <- list(structure(
list(
experience = structure(
list(
start = "1",
end = "1",
roleName = "a",
summary = "a",
duration = "a",
current = "a",
org = structure(list(name = "a", url = "a"), .Names = c("name","url")),
location = structure(
list(
displayLocation = NULL,
lat = NULL,
lng = NULL
),
.Names = c("displayLocation",
"lat", "lng")
) ),.Names = c("start", "end", "roleName", "summary", "duration", "current", "org", "location")),
`_meta` = structure(
list(weight = 1L, `_sources` = list(structure(
list(`_origin` = "a"), .Names = "_origin"
))),.Names = c("weight", "_sources"))),.Names = c("experience", "_meta")))
Run Code Online (Sandbox Code Playgroud)
然后:
aa <- lapply(1:length(a), function(y){tryCatch(lapply(1:length(a[[y]]),
function(i){a[[y]][[i]]$experience[names(my.list2[[1]]$experience)]}), error=function(e) print(list()))})
Run Code Online (Sandbox Code Playgroud)
这将改变结构,使其key2
始终处于正确的顺序。
但是,然后我发现此循环后,我还有另一个问题。
有时,例如,我在体验列表中只剩下一个roleName。如果连续两次出现,则重复按键。cumsum
将他们视为相同的经历,而不是分开的经历。
这意味着我无法创建,df3
因为行的标识符重复。即使我可以删除麻烦的行,但i
如果删除任何更改长度的行,名称也将不匹配,如下面的解决方案中那样,该名称使用该序列匹配。
这是我的总代码,可提供更多信息:
for(i in 1:x$count){a[[i]]<-x$results[[i]]$experiences}
aa <- lapply(1:length(a), function(y){tryCatch(lapply(1:length(a[[y]]),
function(i){a[[y]][[i]]$experience[names(my.list2[[1]]$experience)]}), error=function(e) print(list()))})
aaa <- unlist(aa)
dummydf <- data.frame(b=c("start", "end", "roleName", "summary",
"duration", "current", "org.name", "org.url"), key=1:8)
df <- data.frame(a=aaa, b=names(aaa))
df2 <- left_join(df, dummydf)
df2$key2 <- as.factor(cumsum(df2$key < c(0, df2$key[-length(df2$key)])) +1)
df_split <- split(df2, df2$key2)
df3 <- lapply(df_split, function(x){
x %>% select(-c(key, key2)) %>% spread(b, a)
}) %>% data.table::rbindlist(fill=TRUE) %>% t
df3 <- data.frame(df3)
i <- sapply(seq_along(aa), function(y) rep(y, sapply(aa, function(x) length(x))[y])) %>% unlist
names(df3) <- paste0(names(df3), "_", i)
df4 <- data.frame(t(df3))
df4$dates <- as.Date(NA)
df4$dates <- as.Date(df4$start)
df4 <- data.frame(dates = df4$dates)
df4 <- t(df4)
df4 <- data.frame(df4)
names(df4) <- paste0(names(df4), "_", i)
df4[] <- lapply(df4[], as.character)
l1 <- lapply(split(stack(df4), sub('.*_', '', stack(df4)[,2])), '[', 1)
df5 <- t(do.call(cbindPad, l1))
df5 <- data.frame(df5)
Run Code Online (Sandbox Code Playgroud)
cbindpad
取自这个问题
新的示例代码包括问题:
dput3 =
list(list(), list(
structure(list(experience = structure(list(
duration = "1", start = "2014",
end = "3000", roleName = "a",
summary = "aaa",
org = structure(list(name = "a"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL,
lng = NULL), .Names = c("displayLocation", "lat", "lng"
))), .Names = c("duration", "start", "end", "roleName", "summary",
"org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(
structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(end = "3000",
start = "2012", duration = "2",
roleName = "a", summary = "aaa",
org = structure(list(name = "None"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("duration", "start", "end", "roleName",
"summary", "org", "location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(duration = "3",
start = "2006", end = "3000",
roleName = "a", summary = "aaa", org = structure(list(name = " "), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("duration", "start", "end", "roleName",
"summary", "org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(roleName = "a",
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("roleName",
"location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(roleName = "a",
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("roleName",
"location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta"))
),
list(
structure(list(experience = structure(list(
duration = "1", start = "2014",
end = "3000", roleName = "a",
summary = "aaa",
org = structure(list(name = "a"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL,
lng = NULL), .Names = c("displayLocation", "lat", "lng"
))), .Names = c("duration", "start", "end", "roleName", "summary",
"org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(
structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta"))))
Run Code Online (Sandbox Code Playgroud)
也许这可以帮助
library(dplyr)
library(tidyr)
a <- unlist(a)
df <- data.frame(a=a, b=names(a)) %>% mutate(key=cumsum(b=="experience.duration")) %>%
split(.$key) %>% lapply(function(x) x %>% select(-key) %>% spread(b, a)) %>%
do.call(rbind, .) %>% t %>% data.frame
df$key <- rownames(df)
Run Code Online (Sandbox Code Playgroud)
然后您可以过滤感兴趣的行
上面的内容相当于
rbind(unlist(a)[1:8], unlist(a)[9:16],unlist(a)[17:24]) %>% t
Run Code Online (Sandbox Code Playgroud)
试试这个dput2
a <- unlist(dput2)
library(dplyr)
library(tidyr)
dummydf <- data.frame(b=c("experience.start", "experience.end", "experience.roleName", "experience.summary",
"experience.org", "experience.org.name", "experience.org.url",
"_meta.weight", "_meta._sources._origin", "experience.duration"), key=1:10)
df <- data.frame(a=a, b=names(a))
df2 <- left_join(df, dummydf)
df2$key2 <- as.factor(cumsum(df2$key < c(0, df2$key[-length(df2$key)])) +1)
df_split <- split(df2, df2$key2)
df3 <- lapply(df_split, function(x){
x %>% select(-c(key, key2)) %>% spread(b, a)
}) %>% data.table::rbindlist(fill=TRUE) %>% t
df3 <- data.frame(df3)
i <- sapply(seq_along(dput2), function(y) rep(y, sapply(dput2, function(x) length(x))[y])) %>% unlist
names(df3) <- paste0(names(df3), "_", i)
View(df3)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1953 次 |
最近记录: |