dplyr:根据变量字符串选择的多列改变新列

str*_*oop 0 variables select r dplyr

鉴于此数据:

df=data.frame(
  x1=c(2,0,0,NA,0,1,1,NA,0,1),
  x2=c(3,2,NA,5,3,2,NA,NA,4,5),
  x3=c(0,1,0,1,3,0,NA,NA,0,1),
  x4=c(1,0,NA,3,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,3,4,NA,3,3,1))
Run Code Online (Sandbox Code Playgroud)

我想min使用 dplyr 为所选列的行最小值创建一个额外的列。使用列名很容易:

df <- df %>% rowwise() %>% mutate(min = min(x2,x5))
Run Code Online (Sandbox Code Playgroud)

但是我有一个很大的 df 列名不同,所以我需要从一些值字符串中匹配它们mycols。现在其他线程告诉我使用选择辅助函数,但我一定遗漏了一些东西。这是matches

mycols <- c("x2","x5")
df <- df %>% rowwise() %>%
  mutate(min = min(select(matches(mycols))))
Error: is.string(match) is not TRUE
Run Code Online (Sandbox Code Playgroud)

并且one_of

mycols <- c("x2","x5")
 df <- df %>%
 rowwise() %>%
 mutate(min = min(select(one_of(mycols))))
Error: no applicable method for 'select' applied to an object of class "c('integer', 'numeric')"
In addition: Warning message:
In one_of(c("x2", "x5")) : Unknown variables: `x2`, `x5`
Run Code Online (Sandbox Code Playgroud)

我在看什么?应该select_工作吗?它不在以下内容中:

df <- df %>%
   rowwise() %>%
   mutate(min = min(select_(mycols)))
Error: no applicable method for 'select_' applied to an object of class "character"
Run Code Online (Sandbox Code Playgroud)

同样:

df <- df %>%
  rowwise() %>%
  mutate(min = min(select_(matches(mycols))))
Error: is.string(match) is not TRUE
Run Code Online (Sandbox Code Playgroud)

cde*_*erv 5

这是另一个在以下方面有点技术性的解决方案 purrr为函数式编程设计的 tidyverse 包。

拳头,matches助手 fromdplyr将正则表达式字符串作为参数而不是向量。这是查找与所有列匹配的正则表达式的好方法。(在下面的代码中,您可以使用dplyr您希望的选择助手)

然后,当您了解函数式编程的底层方案时,purrr函数会很好地工作dplyr

解决您的问题:


df=data.frame(
  x1=c(2,0,0,NA,0,1,1,NA,0,1),
  x2=c(3,2,NA,5,3,2,NA,NA,4,5),
  x3=c(0,1,0,1,3,0,NA,NA,0,1),
  x4=c(1,0,NA,3,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,3,4,NA,3,3,1))


# regex to get only x2 and x5 column
mycols <- "x[25]"

library(dplyr)

df %>%
  mutate(min_x2_x5 =
           # select columns that you want in df
           select(., matches(mycols)) %>% 
           # use pmap on this subset to get a vector of min from each row.
           # dataframe is a list so pmap works on each element of the list that is to say each row
           purrr::pmap_dbl(min)
         )
#>    x1 x2 x3 x4 x5 min_x2_x5
#> 1   2  3  0  1  1         1
#> 2   0  2  1  0  1         1
#> 3   0 NA  0 NA NA        NA
#> 4  NA  5  1  3  1         1
#> 5   0  3  3  0  3         3
#> 6   1  2  0  0  4         2
#> 7   1 NA NA NA NA        NA
#> 8  NA NA NA  0  3        NA
#> 9   0  4  0  0  3         3
#> 10  1  5  1  1  1         1
Run Code Online (Sandbox Code Playgroud)

我不会purrr在这里进一步解释,但它在你的情况下工作正常