Con*_* M. 6 r list data-structures
使用lapply,我将一个输入向量输入到一个函数中,该函数为每个输入返回两个向量的列表 - 可能的 nth-gram 及其概率。我最终得到了一个具有以下结构的列表列表(笑):
> str(lol)
List of 3
$ :List of 2
..$ np1 : chr [1:7] "a" "years" "the" "my" ...
..$ probs: num [1:7] 0.1481 0.1357 0.0841 0.0698 0.0522 ...
$ :List of 2
..$ np1 : chr [1:167] "the" "a" "my" "years" ...
..$ probs: num [1:167] 0.2745 0.0924 0.0605 0.0437 0.0334 ...
$ :List of 2
..$ np1 : chr [1:9493] "the" "a" "my" "this" ...
..$ probs: num [1:9493] 0.267 0.0777 0.0239 0.0169 0.0158 ...
Run Code Online (Sandbox Code Playgroud)
但我的目标是一个单一列表,其中所有向量$np1都连接在一起,并且所有$probs向量也是如此。我尝试使用unlist(..., recursive = F)来获取两个向量的列表,与unlist不使用递归标志相比,它让我更接近我正在寻找的内容。
> str(unlist(lapply(inputs.list, function(x){...}), recursive = F))
List of 6
$ np1 : chr [1:7] "a" "years" "the" "my" ...
$ probs: num [1:7] 0.1481 0.1357 0.0841 0.0698 0.0522 ...
$ np1 : chr [1:167] "the" "a" "my" "years" ...
$ probs: num [1:167] 0.2745 0.0924 0.0605 0.0437 0.0334 ...
$ np1 : chr [1:9493] "the" "a" "my" "this" ...
$ probs: num [1:9493] 0.267 0.0777 0.0239 0.0169 0.0158 ...
Run Code Online (Sandbox Code Playgroud)
但不完全在那里...
有没有一种方法可以帮助我进一步将展平列表合并为仅包含两个向量的列表(如上所述)?
这是一个可重现的示例:
example1 <- list("time in"=list(np1=c("the", "a", "my", "years"), probs=c(0.2745, 0.0924, 0.0605, 0.0437)),"in"=list(np1=c("the", "a", "my", "this"), probs=c(0.267, 0.0777, 0.0239, 0.0169)))
> str(example1)
List of 2
$ time in:List of 2
..$ np1 : chr [1:4] "the" "a" "my" "years"
..$ probs: num [1:4] 0.2745 0.0924 0.0605 0.0437
$ in :List of 2
..$ np1 : chr [1:4] "the" "a" "my" "this"
..$ probs: num [1:4] 0.267 0.0777 0.0239 0.0169
Run Code Online (Sandbox Code Playgroud)
这是一个“取消列出”的解决方案,与您正在使用的解决方案类似。它依赖于您感兴趣的向量始终交替(例如,它总是nth然后probs。祝您好运,如果它不适合您,请告诉我!
unlist_ed <- unlist(example1, recursive = F)
list(
np1 = unlist(unlist_ed[c(T, F)]),
probs = unlist(unlist_ed[c(F, T)])
)
$np1
time in.np11 time in.np12 time in.np13 time in.np14 in.np11 in.np12 in.np13 in.np14
"the" "a" "my" "years" "the" "a" "my" "this"
$probs
time in.probs1 time in.probs2 time in.probs3 time in.probs4 in.probs1 in.probs2 in.probs3
0.2745 0.0924 0.0605 0.0437 0.2670 0.0777 0.0239
in.probs4
0.0169
Run Code Online (Sandbox Code Playgroud)
编辑:我想到了另一种解决方案,它依赖于相同的向量名称,但它要快得多(这不是目标)。想要更新!
dplyr::bind_rows(example1)
# A tibble: 8 x 2
np1 probs
<chr> <dbl>
1 the 0.274
2 a 0.0924
3 my 0.0605
4 years 0.0437
5 the 0.267
6 a 0.0777
7 my 0.0239
8 this 0.0169
Run Code Online (Sandbox Code Playgroud)
不是一个完美的基准:
example1 <- rapply(example1, function(x) rep(x, 1e4), how = "list")
example1 <- rep(example1, 100)
microbenchmark::microbenchmark(
o1 = {
Reduce(function(...) Map(c, ...), example1)
},
o2 = {
unlist_ed <- unlist(example1, recursive = F)
list(
nth = unlist(unlist_ed[c(T, F)]),
probs = unlist(unlist_ed[c(F, T)])
)
},
o3 = {
transpose(example1) %>% map(flatten) %>% map(unlist)
},
o4 = {
binded <- dplyr::bind_rows(example1)
list(binded$np1,
binded$probs)
},
times = 1
)
Unit: milliseconds
expr min lq mean median uq max neval
o1 5022.25495 5022.25495 5022.25495 5022.25495 5022.25495 5022.25495 1
o2 5146.75265 5146.75265 5146.75265 5146.75265 5146.75265 5146.75265 1
o3 2491.21422 2491.21422 2491.21422 2491.21422 2491.21422 2491.21422 1
o4 83.32919 83.32919 83.32919 83.32919 83.32919 83.32919 1
Run Code Online (Sandbox Code Playgroud)