将所有数据框字符列转换为因子

Mus*_*ful 55 r dataframe

给定(预先存在的)具有各种类型列的数据框,将所有字符列转换为因子的最简单方法是什么,而不影响其他类型的任何列?

这是一个例子data.frame:

df <- data.frame(A = factor(LETTERS[1:5]),
                 B = 1:5, C = as.logical(c(1, 1, 0, 0, 1)),
                 D = letters[1:5],
                 E = paste(LETTERS[1:5], letters[1:5]),
                 stringsAsFactors = FALSE)
df
#   A B     C D   E
# 1 A 1  TRUE a A a
# 2 B 2  TRUE b B b
# 3 C 3 FALSE c C c
# 4 D 4 FALSE d D d
# 5 E 5  TRUE e E e
str(df)
# 'data.frame':  5 obs. of  5 variables:
#  $ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
#  $ B: int  1 2 3 4 5
#  $ C: logi  TRUE TRUE FALSE FALSE TRUE
#  $ D: chr  "a" "b" "c" "d" ...
#  $ E: chr  "A a" "B b" "C c" "D d" ...
Run Code Online (Sandbox Code Playgroud)

我知道我能做到:

df$D <- as.factor(df$D)
df$E <- as.factor(df$E)
Run Code Online (Sandbox Code Playgroud)

有没有办法自动化这个过程多一点?

A5C*_*2T1 88

罗兰的答案对于这个具体问题很有帮助,但我想我会分享一种更普遍的方法.

DF <- data.frame(x = letters[1:5], y = 1:5, z = LETTERS[1:5], 
                 stringsAsFactors=FALSE)
str(DF)
# 'data.frame':  5 obs. of  3 variables:
#  $ x: chr  "a" "b" "c" "d" ...
#  $ y: int  1 2 3 4 5
#  $ z: chr  "A" "B" "C" "D" ...

## The conversion
DF[sapply(DF, is.character)] <- lapply(DF[sapply(DF, is.character)], 
                                       as.factor)
str(DF)
# 'data.frame':  5 obs. of  3 variables:
#  $ x: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
#  $ y: int  1 2 3 4 5
#  $ z: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
Run Code Online (Sandbox Code Playgroud)

对于转换,assign(DF[sapply(DF, is.character)])的左侧子集是字符列.在右侧,对于该子集,您可以使用lapply您需要执行的任何转换.R非常智能,可以用结果替换原始列.

关于这一点的一个方便的事情是,如果你想采取其他方式或进行其他转换,就像在左边改变你想要的东西并在右边指定你想要改变它的东西一样简单.


Rol*_*and 59

DF <- data.frame(x=letters[1:5], y=1:5, stringsAsFactors=FALSE)

str(DF)
#'data.frame':  5 obs. of  2 variables:
# $ x: chr  "a" "b" "c" "d" ...
# $ y: int  1 2 3 4 5
Run Code Online (Sandbox Code Playgroud)

(令人讨厌的)默认值as.data.frame是将所有字符列转换为因子列.你可以在这里使用它:

DF <- as.data.frame(unclass(DF))
str(DF)
#'data.frame':  5 obs. of  2 variables:
# $ x: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
# $ y: int  1 2 3 4 5
Run Code Online (Sandbox Code Playgroud)

  • 从 R 的最新版本开始,这不再一定是正确的。最好的选择似乎是在调用“as.data.frame()”时将“stringsAsFactors”参数设置为“TRUE”。https://developer.r-project.org/Blog/public/2020/02/16/stringsasfactors/ (4认同)

Ted*_*ard 35

正如@Raf Z对这个问题发表评论,dplyr现在有了mutate_if.超级实用,简单易读.

> str(df)
'data.frame':   5 obs. of  5 variables:
 $ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
 $ B: int  1 2 3 4 5
 $ C: logi  TRUE TRUE FALSE FALSE TRUE
 $ D: chr  "a" "b" "c" "d" ...
 $ E: chr  "A a" "B b" "C c" "D d" ...

> df <- df %>% mutate_if(is.character,as.factor)

> str(df)
'data.frame':   5 obs. of  5 variables:
 $ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
 $ B: int  1 2 3 4 5
 $ C: logi  TRUE TRUE FALSE FALSE TRUE
 $ D: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
 $ E: Factor w/ 5 levels "A a","B b","C c",..: 1 2 3 4 5
Run Code Online (Sandbox Code Playgroud)


Geo*_*pis 8

dplyr

library(dplyr)

df <- data.frame(A = factor(LETTERS[1:5]),
                 B = 1:5, C = as.logical(c(1, 1, 0, 0, 1)),
                 D = letters[1:5],
                 E = paste(LETTERS[1:5], letters[1:5]),
                 stringsAsFactors = FALSE)

str(df)
Run Code Online (Sandbox Code Playgroud)

我们得到:

'data.frame':   5 obs. of  5 variables:
 $ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
 $ B: int  1 2 3 4 5
 $ C: logi  TRUE TRUE FALSE FALSE TRUE
 $ D: chr  "a" "b" "c" "d" ...
 $ E: chr  "A a" "B b" "C c" "D d" ...
Run Code Online (Sandbox Code Playgroud)

现在,我们可以将所有转换chrfactors

df <- df%>%mutate_if(is.character, as.factor)
str(df)
Run Code Online (Sandbox Code Playgroud)

我们得到:

'data.frame':   5 obs. of  5 variables:
 $ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
 $ B: int  1 2 3 4 5
 $ C: logi  TRUE TRUE FALSE FALSE TRUE
 $ D: chr  "a" "b" "c" "d" ...
 $ E: chr  "A a" "B b" "C c" "D d" ...
Run Code Online (Sandbox Code Playgroud)

让我们也提供其他解决方案:

带基础包:

df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)], 
                                                           as.factor)
Run Code Online (Sandbox Code Playgroud)

使用dplyr1.0.0

df <- df%>%mutate(across(where(is.factor), as.character))
Run Code Online (Sandbox Code Playgroud)

purrr包:

library(purrr)

df <- df%>% modify_if(is.factor, as.character) 
Run Code Online (Sandbox Code Playgroud)


小智 5

最简单的方法是使用下面给出的代码。它会自动完成将所有变量转换为 R 中数据帧中的因子的整个过程。它对我来说非常好。这里的 food_cat 是我正在使用的数据集。将其更改为您正在处理的那个。

    for(i in 1:ncol(food_cat)){

food_cat[,i] <- as.factor(food_cat[,i])

}
Run Code Online (Sandbox Code Playgroud)

  • 这会将所有列更改为因子,无论其类型如何。 (3认同)