Unnest 或 unchop 包含不同长度列表的数据帧

Question

Unnest 或 unchop 包含不同长度列表的数据帧

我有一个包含多个列的数据框，其中包含我想要unnest（或unchop）的列表列。但是，它们的长度不同，因此产生的错误是Error: No common size for...

这是一个 reprex 来显示哪些有效，哪些无效。

library(tidyr)
library(vctrs)

# This works as expected
df_A <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9)))
)

unchop(df_A, cols = c(A))
# A tibble: 7 x 2
     ID     A
  <int> <dbl>
1     1     9
2     1     8
3     1     5
4     2     7
5     2     6
6     3     6
7     3     9

# This works as expected as the lists are the same lengths

df_AB_1 <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),
  B = as_list_of(list(c(1, 2, 3), c(4, 5), c(7, 8)))
)

unchop(df_AB_1, cols = c(A, B))

# A tibble: 7 x 3
     ID     A     B
  <int> <dbl> <dbl>
1     1     9     1
2     1     8     2
3     1     5     3
4     2     7     4
5     2     6     5
6     3     6     7
7     3     9     8

# This does NOT work as the lists are different lengths

df_AB_2 <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),
  B = as_list_of(list(c(1, 2), c(4, 5, 6), c(7, 8, 9, 0)))
)

unchop(df_AB_2, cols = c(A, B))

# Error: No common size for `A`, size 3, and `B`, size 2.

Run Code Online (Sandbox Code Playgroud)

我想为df_AB_2上面实现的输出如下，其中每个列表都未切碎，缺失值用 NA 填充：

# A tibble: 10 x 3
      ID     A     B
   <dbl> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0

Run Code Online (Sandbox Code Playgroud)

我已经提到这个问题，在Github和StackOverflow上这里。

任何想法如何实现上述结果？

版本

> packageVersion("tidyr")
[1] ‘1.0.0’
> packageVersion("vctrs")
[1] ‘0.2.0.9001’

Run Code Online (Sandbox Code Playgroud)

Answer 1

Sot*_*tos 9

这是通过 dplyr 的一个想法，您可以根据需要将其推广到任意数量的列，

library(tidyverse)

df_AB_2 %>% 
 pivot_longer(c(A, B)) %>% 
 mutate(value = lapply(value, `length<-`, max(lengths(value)))) %>% 
 pivot_wider(names_from = name, values_from = value) %>% 
 unnest() %>% 
 filter(rowSums(is.na(.[-1])) != 2)

Run Code Online (Sandbox Code Playgroud)

这使，

# A tibble: 10 x 3
      ID     A     B
   <int> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，1 月前
查看次数：	1119 次
最近记录：	6 年，1 月前