我可以结合使用 dplyr mutate_at 和 mutate_if 语句吗?

Tom*_*Tom 7 interpolation r dplyr mutate

我有以下示例输出:

   country country-year year     a     b
1  France  France2000   2000       NA    NA 
2  France  France2001   2001     1000  1000  
3  France  France2002   2002       NA    NA
4  France  France2003   2003     1600  2200
5  France  France2004   2004       NA    NA
6  UK          UK2000   2000     1000  1000  
7  UK          UK2001   2001       NA    NA
8  UK          UK2002   2002     1000  1000  
9  UK          UK2003   2003       NA    NA
10 UK          UK2004   2004       NA    NA
11 Germany     UK2000   2000       NA    NA 
12 Germany     UK2001   2001       NA    NA
13 Germany     UK2002   2002       NA    NA  
14 Germany     UK2003   2003       NA    NA
15 Germany     UK2004   2004       NA    NA
Run Code Online (Sandbox Code Playgroud)

我想插入数据 I(但不是外推),并删除列ab都是 NA 的列。换句话说,我想删除所有无法插入的列;在示例中:

1  France  France2000        NA    NA
5  France  France2004        NA    NA
9  UK          UK2003        NA    NA
10 UK          UK2004        NA    NA
11 Germany     UK2000        NA    NA 
12 Germany     UK2001        NA    NA
13 Germany     UK2002        NA    NA  
14 Germany     UK2003        NA    NA
15 Germany     UK2004        NA    NA
Run Code Online (Sandbox Code Playgroud)

有两个选项几乎可以满足我的要求:

library(tidyverse)
library(zoo)
df %>%
  group_by(country) %>%
  mutate_at(vars(a:b),~na.fill(.x,c(NA, "extend", NA))) %>% 
  filter(!is.na(a) | !is.na(b))
Run Code Online (Sandbox Code Playgroud)

df%>% 
  group_by(Country)%>% 
  mutate_if(is.numeric,~if(all(is.na(.x))) NA else na.fill(.x,"extend"))
Run Code Online (Sandbox Code Playgroud)

是否可以组合这些代码,做这样的事情:

df <- df%>%
  group_by(country)%>%
  mutate_at(vars(a:b),~if(all(is.na(.x))) NA else(.x,c(NA, "extend", NA)))
  filter(!is.na(df$a | df$a))
Run Code Online (Sandbox Code Playgroud)

期望的输出:

   country country-year    a     b 
2  France  France2001      1000  1000  
3  France  France2002      1300  1600
4  France  France2003      1600  2200
6  UK          UK2000      1000  1000  
7  UK          UK2001         0     0
8  UK          UK2002      1000  1000
Run Code Online (Sandbox Code Playgroud)

kat*_*ath 2

mutate_if我知道这并不能直接回答如何组合and的问题mutate_at,但这解决了您的一般问题:

我首先去掉所有a和b都缺失的国家,然后确定每个国家不缺失的最小和最大年份。过滤这些之后,我使用na.fill.

library(dplyr)
library(readr)
library(zoo)

country_data %>% 
  mutate(Year = parse_number(`country-year`)) %>% 
  group_by(country) %>% 
  mutate(not_all_na = any(!(is.na(a) & is.na(b)))) %>% 
  filter(not_all_na) %>% 
  mutate(Year_min_not_na = min(Year[!(is.na(a) & is.na(b))]), 
         Year_max_not_na = max(Year[!(is.na(a) & is.na(b))])) %>% 
  filter(Year >= Year_min_not_na, Year <= Year_max_not_na) %>% 
  mutate_at(vars(a:b), ~na.fill(.x, "extend")) 

# A tibble: 6 x 8
# Groups:   country [2]
#   country `country-year`     a     b  Year not_all_na Year_min_not_na Year_max_not_na
#   <fct>   <fct>          <dbl> <dbl> <dbl> <lgl>                <dbl>           <dbl>
# 1 France  France2001      1000  1000  2001 TRUE                  2001            2003
# 2 France  France2002      1300  1600  2002 TRUE                  2001            2003
# 3 France  France2003      1600  2200  2003 TRUE                  2001            2003
# 4 UK      UK2000          1000  1000  2000 TRUE                  2000            2002
# 5 UK      UK2001          1000  1000  2001 TRUE                  2000            2002
# 6 UK      UK2002          1000  1000  2002 TRUE                  2000            2002
Run Code Online (Sandbox Code Playgroud)

数据

country_data <- 
  structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L), 
                                                   .Label = c("France", "Germany", "UK"), class = "factor"), 
                               country.year = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 6L, 7L, 8L, 9L, 10L), 
                                                        .Label = c("France2000", "France2001", "France2002", "France2003", 
                                                                   "France2004", "UK2000", "UK2001", "UK2002", "UK2003", "UK2004"), 
                                                        class = "factor"), 
                               a = c(NA, 1000L, NA, 1600L, NA, 1000L, NA, 1000L, NA, NA, NA, NA, NA, NA, NA),
                               b = c(NA, 1000L, NA, 2200L, NA, 1000L, NA, 1000L, NA, NA, NA, NA, NA, NA, NA)), 
                          class = "data.frame", row.names = c(NA, -15L))
Run Code Online (Sandbox Code Playgroud)