无法在dplyr中使用多字变量,还是我缺少某些东西?

and*_*wjc 7 r function dplyr

dplyr与beta.linalool相比,为什么我的函数中不喜欢这种格式的“ beta linalool”?

我花了几个小时进行故障排除,以找出问题所在。有什么方法可以使用将变量标记为多个单词的数据,还是应该将所有内容都移至beta.linalool类型格式?

我学到的一切都来自dplyr编程

library(ggplot2)
library(readxl)
library(dplyr)
library(magrittr)

Data3<- read_excel("Desktop/Data3.xlsx")

Data3 %>% filter(Variety=="CS 420A"&`Red Blotch`=="-")%>% group_by(`Time Point`)%>%
  summarise(m=mean(`beta linalool`),SD=sd(`beta linalool`))
# A tibble: 4 x 3
  `Time Point`       m         SD
  <chr>           <dbl>      <dbl>
1 End          0.00300  0.000117  
2 Mid          0.00385  0.000353  
3 Must         0.000254 0.00000633
4 Start        0.000785 0.000283  
Run Code Online (Sandbox Code Playgroud)

现在,当我将其工作为一个函数时:

cwine<-function(df,v,rb,c){
  c<-enquo(c)
  df %>% filter(Variety==v&`Red Blotch`==rb)%>% 
    group_by(`Time Point`) %>%
    summarise_(m=mean(!!c),SD=sd(!!c)) %>% 
}
cwine(Data3,"CS 420A","-",'beta linalool')
# A tibble: 4 x 3
  `Time Point`     m    SD
  <chr>        <dbl> <dbl>
1 End             NA    NA
2 Mid             NA    NA
3 Must            NA    NA
4 Start           NA    NA
Warning messages:
1: In mean.default(~"beta linalool") :
  argument is not numeric or logical: returning NA #this statement is repeated 4 more times
5: In var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
  NAs introduced by coercion #this statement is repeated 4 more times
Run Code Online (Sandbox Code Playgroud)

问题在于,将beta linalool键入为“ beta linalool”。我通过在虹膜数据集上尝试这种方法并发现Petal.Length不是'Petal Width'来解决这个问题:

my_function<-function(ds,x,y,c){
  c<-enquo(c)
  ds %>%filter(Sepal.Length>x&Sepal.Width<y) %>% 
    group_by(Species) %>% 
    summarise(m=mean(!!c),SD=sd(!!c))
}
my_function2(iris,5,4,Petal.Length)
# A tibble: 3 x 3
  Species        m    SD
  <fct>      <dbl> <dbl>
1 setosa      1.53 0.157
2 versicolor  4.32 0.423
3 virginica   5.57 0.536
Run Code Online (Sandbox Code Playgroud)

实际上,我的函数可以在其他变量上正常工作:

> cwine(Data2,"CS 420A","-",nerol)
# A tibble: 4 x 3
  `Time Point`        m        SD
  <chr>           <dbl>     <dbl>
1 End          0.000453 0.0000338
2 Mid          0.000659 0.0000660
3 Must         0.000560 0.0000234
4 Start        0.000927 0.0000224
Run Code Online (Sandbox Code Playgroud)

dplyr就是这么敏感吗?

akr*_*run 4

一种选择是将其转换为symbol 并对其进行评估

library(tidyverse)
cwine <- function(df,v,rb,c){
  
  df %>% 
      filter(Variety==v & `Red Blotch` == rb)%>% 
      group_by(`Time Point`) %>%
       summarise(m = mean(!!rlang::sym(c)),
                 SD = sd(!! rlang::sym(c))) 
}

cwine(Data3,"CS 420A","-",'beta linalool')
# A tibble: 2 x 3
#  `Time Point`       m    SD
#         <int>   <dbl> <dbl>
#1            2 -2.11    2.23
#2            4  0.0171 NA  
Run Code Online (Sandbox Code Playgroud)

另外,如果我们想通过转换为 quosure ( enquo) 来传递它,当我们传递带有反引号的变量名时,它就可以工作(通常,不带引号的版本可以工作,但这里单词之间有一个空格,并且按原样评估它,反引号是需要的)

cwine <- function(df,v,rb,c){
  c1 <- enquo(c)
  df %>% 
      filter(Variety==v & `Red Blotch` == rb)%>% 
      group_by(`Time Point`) %>%
       summarise(m = mean(!! c1 ),
                 SD = sd(!! c1)) 
}

cwine(Data3,"CS 420A","-",`beta linalool`)
# A tibble: 2 x 3
#   `Time Point`       m    SD
#         <int>   <dbl> <dbl>
#1            2 -2.11    2.23
#2            4  0.0171 NA   
Run Code Online (Sandbox Code Playgroud)

数据

set.seed(24)
Data3 <- tibble(Variety = sample(c("CS 420A", "CS 410A"), 20, replace = TRUE),
`Red Blotch` = sample(c("-", "+"), 20, replace = TRUE), 
`Time Point` = sample(1:4, 20, replace = TRUE),
`beta linalool` = rnorm(20))
Run Code Online (Sandbox Code Playgroud)

  • @akrun 感谢您花时间处理此问题,您推荐的内容效果很好。 (2认同)