将字符串转换为函数中 dplyr 接受的符号

Yin*_*Yin 5 r dplyr

我的数据框看起来像:

> str(b)
'data.frame':   2720 obs. of  3 variables:
 $ Hospital.Name: chr  "SOUTHEAST ALABAMA MEDICAL CENTER" "MARSHALL MEDICAL CENTER SOUTH" "ELIZA COFFEE MEMORIAL HOSPITAL" "ST VINCENT'S EAST" ...
 $ State        : chr  "AL" "AL" "AL" "AL" ...
 $ heart attack : num  14.3 18.5 18.1 17.7 18 15.9 19.6 17.3 17.8 17.5 ...
Run Code Online (Sandbox Code Playgroud)

我想按状态对其进行分组,按状态和心脏病发作对它们进行排序,然后添加一列返回每个组内的行号。理想的结果如下:

# A tibble: 2,720 x 4
# Groups:   State [54]
                      Hospital.Name State `heart attack`  rank
                              <chr> <chr>          <dbl> <int>
 1 PROVIDENCE ALASKA MEDICAL CENTER    AK           13.4     1
 2         ALASKA REGIONAL HOSPITAL    AK           14.5     2
 3      FAIRBANKS MEMORIAL HOSPITAL    AK           15.5     3
 4     ALASKA NATIVE MEDICAL CENTER    AK           15.7     4
 5   MAT-SU REGIONAL MEDICAL CENTER    AK           17.7     5
 6         CRESTWOOD MEDICAL CENTER    AL           13.3     1
 7      BAPTIST MEDICAL CENTER EAST    AL           14.2     2
 8 SOUTHEAST ALABAMA MEDICAL CENTER    AL           14.3     3
 9               GEORGIANA HOSPITAL    AL           14.5     4
10      PRATTVILLE BAPTIST HOSPITAL    AL           14.6     5
# ... with 2,710 more rows
Run Code Online (Sandbox Code Playgroud)

所以我的代码是:

             outcome<-"heart attack"
            c<-arrange(b,State,sym(outcome))%>%
                    group_by(State)%>%
            mutate(rank=row_number(sym(outcome)))
Run Code Online (Sandbox Code Playgroud)

但我收到了这个错误:

Error in arrange_impl(.data, dots) : object 'heart attack' not found
Run Code Online (Sandbox Code Playgroud)

当我独立运行 sym(outcome) 并将结果复制到我的代码中时,它可以工作:

sym(outcome)
`heart attack`
c<-arrange(b,State,`heart attack`)%>%
+                         group_by(State)%>%
+                 mutate(rank=rank(`heart attack`))
> c
# A tibble: 2,720 x 4
# Groups:   State [54]
                      Hospital.Name State `heart attack`  rank
                              <chr> <chr>          <chr> <dbl>
 1 PROVIDENCE ALASKA MEDICAL CENTER    AK           13.4     1
 2         ALASKA REGIONAL HOSPITAL    AK           14.5     2
 3      FAIRBANKS MEMORIAL HOSPITAL    AK           15.5     3
 4     ALASKA NATIVE MEDICAL CENTER    AK           15.7     4
 5   MAT-SU REGIONAL MEDICAL CENTER    AK           17.7     5
 6         CRESTWOOD MEDICAL CENTER    AL           13.3     1
 7      BAPTIST MEDICAL CENTER EAST    AL           14.2     2
 8 SOUTHEAST ALABAMA MEDICAL CENTER    AL           14.3     3
 9               GEORGIANA HOSPITAL    AL           14.5     4
10      PRATTVILLE BAPTIST HOSPITAL    AL           14.6     5
# ... with 2,710 more rows
Run Code Online (Sandbox Code Playgroud)

这是函数的一部分,因此“结果”需要是一个字符串。因此,我尝试将字符串转换为符号,以便我可以在 dplyr 中引用该列。谁能告诉我这里发生了什么?有什么好的方法可以实现我的目标吗?

Psi*_*dom 7

您需要使用以下方式取消引用该符号!!

arrange(b, State, !!sym(outcome))
Run Code Online (Sandbox Code Playgroud)

或者UQ

arrange(b, State, UQ(sym(outcome)))
Run Code Online (Sandbox Code Playgroud)

同样对于mutate

mutate(rank=row_number(!!sym(outcome)))   # or mutate(rank=row_number(UQ(sym(outcome))))
Run Code Online (Sandbox Code Playgroud)

  • 这是正确的。从文档来看,UQ“立即在周围的上下文中评估符号。”。由于变量的上下文是隐式的,因此需要此“UQ”过程来确保变量从数据帧而不是全局环境中推断。 (2认同)