我正在寻找下面的问题的解决方案,将在管道中支持.
我的数据看起来像这样:
tibble(
column_set_1_1 = c(1, 2, 3), column_set_1_2 = c(2, 3, NA), column_set_1_3 = c(3, NA, NA),
column_set_2_1 = c(1, 2, 3), column_set_2_2 = c(4, 5, 6), column_set_2_3 = c(7, 8, 9),
column_set_2_4 = c(10, 11, NA), column_set_2_5 = c(13, NA, NA), column_set_2_6 = c(NA, NA, NA)
)
# A tibble: 3 × 9
column_set_1_1 column_set_1_2 column_set_1_3 column_set_2_1 column_set_2_2 column_set_2_3 column_set_2_4 column_set_2_5 column_set_2_6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 1 2 3 1 4 7 10 13 NA
2 2 3 NA 2 5 8 11 NA NA
3 3 NA NA 3 6 9 NA NA NA
Run Code Online (Sandbox Code Playgroud)
我基本上希望按列集获取最后一个非NA值.所以,预期的输出是:
tibble(
column_set_1 = c(3, 3, 3),
column_set_2 = c(13, 11, 9)
)
# A tibble: 3 × 2
column_set_1 column_set_2
<dbl> <dbl>
1 3 13
2 3 11
3 3 9
Run Code Online (Sandbox Code Playgroud)
这是一种tidyverse不重新整形原始数据帧但通过列名模式将其拆分为组的方法,并使用coalesce函数获取每个子数据帧中的最后一个非NA值:
library(tidyverse)
df_foo %>%
mutate_all(as.numeric) %>%
split.default(f = sub("_\\d+$", "", names(.))) %>%
map_df(~do.call(coalesce, setNames(rev(.), NULL)))
# A tibble: 3 × 2
# column_set_1 column_set_2
# <dbl> <dbl>
#1 3 13
#2 3 11
#3 3 9
Run Code Online (Sandbox Code Playgroud)