根据跨多个列的条件将df中的值输入到新变量中

Cha*_*man 6 r dplyr

我确信我不是唯一一个问过这个问题的人但是经过几个小时的搜索没有运气我需要自己提出这个问题.

我有一个像这样的df(rp):

rp <- structure(list(agec1 = c(7, 16, 11, 11, 17, 17), 
               agec2 = c(6, 12, 9, 9, 16, 15), 
               agec3 = c(2, 9, 9, 9, 14, NA), 
               agec4 = c(NA, 7, 9, 9, 13, NA), 
               agec5 = c(NA, 4, 7, 7, 10, NA), 
               agec6 = c(NA, NA, 6, 6, 9, NA), 
               agec7 = c(NA, NA, NA, NA, 7, NA), 
               agec8 = c(NA, NA, NA, NA, 5, NA), 
          row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)

每个年龄的年龄cc指父母的孩子最多8个孩子的年龄.我想创建一个新栏目"agec5_12",其中包含5-12岁儿童的年龄.所以我的df看起来像这样:

rpage <- structure(list(agec1 = c(7, 16, 11, 11, 17, 17), 
               agec2 = c(6, 12, 9, 9, 16, 15), 
               agec3 = c(2, 9, 9, 9, 14, NA), 
               agec4 = c(NA, 7, 9, 9, 13, NA), 
               agec5 = c(NA, 4, 7, 7, 10, NA), 
               agec6 = c(NA, NA, 6, 6, 9, NA), 
               agec7 = c(NA, NA, NA, NA, 7, NA), 
               agec8 = c(NA, NA, NA, NA, 5, NA), 
               agec5_12 = c(7, 12, 11, 11, 10, NA))
          row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)

关于我的数据的说明:

  • 年龄并不总是按照相同的时间顺序排列,即年龄最小,年龄最小或年龄最小
  • 一行可能没有这个范围内的孩子(在这种情况下我希望NA被退回)

我已经尝试编写一个函数并使用rowwise和应用它mutate:

fun.age5_12 <- function(x){
                 x[which(x == max(x[(x > 4) & (x < 13)], na.rm = TRUE))]
                }
rpage <- rp %>%
         select(-c(20:21, 199:200)) %>%
         rowwise() %>% 
         mutate(agec5_12 = fun.age5_12(c(1:8)))
Run Code Online (Sandbox Code Playgroud)

但是,这会将所有障碍物返回为"12".理想情况下,我想使用dplyr来做到这一点.使用mutateifelse不一定使用函数的任何建议都可以.

谢谢

Shr*_*ree 1

我认为apply此类问题的解决方案总是比dplyr(我假设您的意思是tidyverse)解决方案更简单且更具可读性,但既然您问了,这里有一种方法 -

library(dplyr)
library(tidyr)

rp %>% 
  rownames_to_column("parent_id") %>% 
  gather(variable, value, -parent_id) %>% 
  group_by(parent_id) %>%
  arrange(parent_id, desc(value)) %>% 
  mutate(
    agec5_12 = value[between(value, 5, 12)][1]
  ) %>%
  ungroup() %>% 
  spread(variable, value) %>% 
  select(3:10, 2)

# A tibble: 6 x 9
  agec1 agec2 agec3 agec4 agec5 agec6 agec7 agec8 agec5_12
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
1     7     6     2    NA    NA    NA    NA    NA        7
2    16    12     9     7     4    NA    NA    NA       12
3    11     9     9     9     7     6    NA    NA       11
4    11     9     9     9     7     6    NA    NA       11
5    17    16    14    13    10     9     7     5       10
6    17    15    NA    NA    NA    NA    NA    NA       NA
Run Code Online (Sandbox Code Playgroud)