Sha*_*and 2 r filter dataframe dplyr data-wrangling
我在改变 . 中的变量时遇到了一个奇怪的问题dplyr。如果我运行这段代码:
diamonds %>%
select(cut) %>%
table()
Run Code Online (Sandbox Code Playgroud)
我在 R 中看到了数据集中因素的列表diamonds:
cut
Fair Good Very Good Premium Ideal
1610 4906 12082 13791 21551
Run Code Online (Sandbox Code Playgroud)
但是,如果我尝试更改其中一个名称并保留其余名称:
diamonds %>%
mutate(cut.fix = ifelse(cut == "Fair",
"Not Fair at All",
cut)) %>%
select(cut.fix) %>%
table()
Run Code Online (Sandbox Code Playgroud)
它只会更改“固定”值,其他所有内容都会变成数值:
cut.fix
2 3 4 5
4906 12082 13791 21551
Not Fair at All
1610
Run Code Online (Sandbox Code Playgroud)
这是什么原因?我该如何解决?
在这种情况下,警告的信息if_else()更丰富:
library(tidyverse)
diamonds %>%
select(cut) %>%
table()
#> .
#> Fair Good Very Good Premium Ideal
#> 1610 4906 12082 13791 21551
diamonds %>%
mutate(cut.fix = if_else(cut == "Fair",
"Not Fair at All",
cut)) %>%
select(cut.fix) %>%
table()
#> Error in `mutate()`:
#> ! Problem while computing `cut.fix = if_else(cut == "Fair", "Not Fair at
#> All", cut)`.
#> Caused by error in `if_else()`:
#> ! `false` must be a character vector, not a `ordered/factor` object.
Run Code Online (Sandbox Code Playgroud)
该ifelse()函数不是“类型安全”的,它可以以灾难性的方式转换/强制值。使用 dplyrif_else()函数更安全(在这些情况下会出错),您可以进行相应调整,例如,您可以将“cut”转换为字符,而不是有序因子(“cut”):
diamonds %>%
mutate(cut.fix = if_else(cut == "Fair",
"Not Fair at All",
as.character(cut))) %>%
select(cut.fix) %>%
table()
#> .
#> Good Ideal Not Fair at All Premium Very Good
#> 4906 21551 1610 13791 12082
Run Code Online (Sandbox Code Playgroud)
这“有效”,但正如 @RitchieSacramento 指出的那样,更好的解决方案是重新编码“cut”变量并保留因子级别信息,例如使用dplyr::recode():
diamonds %>%
mutate(cut.fix = recode(cut, "Fair" = "Not Fair at All")) %>%
select(cut.fix) %>%
table()
#> .
#> Not Fair at All Good Very Good Premium Ideal
#> 1610 4906 12082 13791 21551
Run Code Online (Sandbox Code Playgroud)
或者,来自@RitchieSacramento 上面评论的解决方案,使用forcats::fct_recode():
diamonds %>%
mutate(cut.fix = fct_recode(cut, "Not fair at All" = "Fair" )) %>%
select(cut.fix) %>%
table()
#> .
#> Not fair at All Good Very Good Premium Ideal
#> 1610 4906 12082 13791 21551
Run Code Online (Sandbox Code Playgroud)
由reprex 包(v2.0.1)创建于 2022-09-27