用dplyr mutate改变因子水平

use*_*472 41 r dplyr

这可能很简单,我觉得这很愚蠢.我想使用mutate更改数据框中因子的级别.简单的例子:

library("dplyr")
dat <- data.frame(x = factor("A"), y = 1)
mutate(dat,levels(x) = "B")
Run Code Online (Sandbox Code Playgroud)

我明白了:

Error: Unexpected '=' in "mutate(dat,levels(x) ="
Run Code Online (Sandbox Code Playgroud)

为什么这不起作用?如何用mutate改变因子水平?

dpp*_*dan 43

随着forcats从包装tidyverse这也很容易.

mutate(dat, x = fct_recode(x, "B" = "A"))
Run Code Online (Sandbox Code Playgroud)


Ste*_*pré 34

我不太确定我正确地理解你的问题,但如果你想改变的因子水平cylmutate()你可以这样做:

df <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8)))
Run Code Online (Sandbox Code Playgroud)

你会得到:

#> str(df$cyl)
# Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
Run Code Online (Sandbox Code Playgroud)

  • 如果水平与原始值不匹配,则应执行factor(cyl,labels = ...)而不是`factor(cyl,labels = ...)`。例如`factor(cyl,labels = c) (“四个”,“六个”,“八个”)`。 (4认同)

Die*_*ego 22

也许您正在寻找这个plyr :: revalue函数:

mutate(dat, x = revalue(x, c("A" = "B")))
Run Code Online (Sandbox Code Playgroud)

你也可以看到plyr :: mapvalues.


Ste*_*ano 14

您可以使用该recode功能dplyr.

df <- iris %>%
     mutate(Species = recode(Species, setosa = "SETOSA",
         versicolor = "VERSICOLOR",
         virginica = "VIRGINICA"
     )
)
Run Code Online (Sandbox Code Playgroud)

  • 请注意,重新编码仅适用于矢量,请参阅PaulFrater的答案,了解适用的版本. (3认同)

cod*_*boy 13

无法发表评论,因为我没有足够的声望点,但重新编码仅适用于矢量,所以@ Stefano的答案中的上述代码应该是

df <- iris %>%
  mutate(Species = recode(Species, 
     setosa = "SETOSA",
     versicolor = "VERSICOLOR",
     virginica = "VIRGINICA")
  )
Run Code Online (Sandbox Code Playgroud)


Flo*_*Flo 9

根据我的理解,当前接受的答案仅改变因子水平的顺序,而不是实际标签(即,如何调用因子的水平).要说明级别标签之间的区别,请考虑以下示例:

cyl成因素(因为它们是按照字母顺序编码指定等级就没有必要):

    mtcars2 <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8))) 
    mtcars2$cyl[1:5]
    #[1] 6 6 4 6 8
    #Levels: 4 6 8
Run Code Online (Sandbox Code Playgroud)

更改级别的顺序(但不是标签本身:cyl仍然是同一列)

    mtcars3 <- mtcars2 %>% mutate(cyl = factor(cyl, levels = c(8, 6, 4))) 
    mtcars3$cyl[1:5]
    #[1] 6 6 4 6 8
    #Levels: 8 6 4
    all(mtcars3$cyl==mtcars2$cyl)
    #[1] TRUE
Run Code Online (Sandbox Code Playgroud)

分配新的标签cyl 标签的顺序是:C(8,6,4),因此,我们指定新的标记如下:

    mtcars4 <- mtcars3 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_8", 
                                                               "new_value_for_6", 
                                                               "new_value_for_4" )))
    mtcars4$cyl[1:5]
    #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
    #Levels: new_value_for_8 new_value_for_6 new_value_for_4
Run Code Online (Sandbox Code Playgroud)

请注意此列与第一列的不同之处:

    all(as.character(mtcars4$cyl)!=mtcars3$cyl) 
    #[1] TRUE 
    #Note: TRUE here indicates that all values are unequal because I used != instead of ==
    #as.character() was required as the levels were numeric and thus not comparable to a character vector
Run Code Online (Sandbox Code Playgroud)

更多细节:

如果我们要改变的水平cyl使用mtcars2,而不是mtcars3,我们需要以不同的方式指定标签,以获得相同的结果.标签的顺序为mtcars2:c(4,6,8),因此我们指定新标签如下

    #change labels of mtcars2 (order used to be: c(4, 6, 8)
    mtcars5 <- mtcars2 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_4", 
                                                               "new_value_for_6", 
                                                               "new_value_for_8" )))
Run Code Online (Sandbox Code Playgroud)

不像mtcars3$cylmtcars4$cyl,在标签mtcars4$cylmtcars5$cyl因此相同的,即使他们的水平有不同的顺序.

    mtcars4$cyl[1:5]
    #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
    #Levels: new_value_for_8 new_value_for_6 new_value_for_4

    mtcars5$cyl[1:5]
    #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
    #Levels: new_value_for_4 new_value_for_6 new_value_for_8

    all(mtcars4$cyl==mtcars5$cyl)
    #[1] TRUE

    levels(mtcars4$cyl) == levels(mtcars5$cyl)
    #1] FALSE  TRUE FALSE
Run Code Online (Sandbox Code Playgroud)