给定(预先存在的)具有各种类型列的数据框,将所有字符列转换为因子的最简单方法是什么,而不影响其他类型的任何列?
这是一个例子data.frame:
df <- data.frame(A = factor(LETTERS[1:5]),
B = 1:5, C = as.logical(c(1, 1, 0, 0, 1)),
D = letters[1:5],
E = paste(LETTERS[1:5], letters[1:5]),
stringsAsFactors = FALSE)
df
# A B C D E
# 1 A 1 TRUE a A a
# 2 B 2 TRUE b B b
# 3 C 3 FALSE c C c
# 4 D 4 FALSE d D d
# 5 E 5 TRUE e E e
str(df)
# 'data.frame': 5 obs. of 5 variables:
# $ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
# $ B: int 1 2 3 4 5
# $ C: logi TRUE TRUE FALSE FALSE TRUE
# $ D: chr "a" "b" "c" "d" ...
# $ E: chr "A a" "B b" "C c" "D d" ...
Run Code Online (Sandbox Code Playgroud)
我知道我能做到:
df$D <- as.factor(df$D)
df$E <- as.factor(df$E)
Run Code Online (Sandbox Code Playgroud)
有没有办法自动化这个过程多一点?
A5C*_*2T1 88
罗兰的答案对于这个具体问题很有帮助,但我想我会分享一种更普遍的方法.
DF <- data.frame(x = letters[1:5], y = 1:5, z = LETTERS[1:5],
stringsAsFactors=FALSE)
str(DF)
# 'data.frame': 5 obs. of 3 variables:
# $ x: chr "a" "b" "c" "d" ...
# $ y: int 1 2 3 4 5
# $ z: chr "A" "B" "C" "D" ...
## The conversion
DF[sapply(DF, is.character)] <- lapply(DF[sapply(DF, is.character)],
as.factor)
str(DF)
# 'data.frame': 5 obs. of 3 variables:
# $ x: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
# $ y: int 1 2 3 4 5
# $ z: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
Run Code Online (Sandbox Code Playgroud)
对于转换,assign(DF[sapply(DF, is.character)])的左侧子集是字符列.在右侧,对于该子集,您可以使用lapply您需要执行的任何转换.R非常智能,可以用结果替换原始列.
关于这一点的一个方便的事情是,如果你想采取其他方式或进行其他转换,就像在左边改变你想要的东西并在右边指定你想要改变它的东西一样简单.
Rol*_*and 59
DF <- data.frame(x=letters[1:5], y=1:5, stringsAsFactors=FALSE)
str(DF)
#'data.frame': 5 obs. of 2 variables:
# $ x: chr "a" "b" "c" "d" ...
# $ y: int 1 2 3 4 5
Run Code Online (Sandbox Code Playgroud)
(令人讨厌的)默认值as.data.frame是将所有字符列转换为因子列.你可以在这里使用它:
DF <- as.data.frame(unclass(DF))
str(DF)
#'data.frame': 5 obs. of 2 variables:
# $ x: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
# $ y: int 1 2 3 4 5
Run Code Online (Sandbox Code Playgroud)
Ted*_*ard 35
正如@Raf Z对这个问题发表评论,dplyr现在有了mutate_if.超级实用,简单易读.
> str(df)
'data.frame': 5 obs. of 5 variables:
$ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
$ B: int 1 2 3 4 5
$ C: logi TRUE TRUE FALSE FALSE TRUE
$ D: chr "a" "b" "c" "d" ...
$ E: chr "A a" "B b" "C c" "D d" ...
> df <- df %>% mutate_if(is.character,as.factor)
> str(df)
'data.frame': 5 obs. of 5 variables:
$ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
$ B: int 1 2 3 4 5
$ C: logi TRUE TRUE FALSE FALSE TRUE
$ D: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
$ E: Factor w/ 5 levels "A a","B b","C c",..: 1 2 3 4 5
Run Code Online (Sandbox Code Playgroud)
与 dplyr
library(dplyr)
df <- data.frame(A = factor(LETTERS[1:5]),
B = 1:5, C = as.logical(c(1, 1, 0, 0, 1)),
D = letters[1:5],
E = paste(LETTERS[1:5], letters[1:5]),
stringsAsFactors = FALSE)
str(df)
Run Code Online (Sandbox Code Playgroud)
我们得到:
'data.frame': 5 obs. of 5 variables:
$ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
$ B: int 1 2 3 4 5
$ C: logi TRUE TRUE FALSE FALSE TRUE
$ D: chr "a" "b" "c" "d" ...
$ E: chr "A a" "B b" "C c" "D d" ...
Run Code Online (Sandbox Code Playgroud)
现在,我们可以将所有转换chr为factors:
df <- df%>%mutate_if(is.character, as.factor)
str(df)
Run Code Online (Sandbox Code Playgroud)
我们得到:
'data.frame': 5 obs. of 5 variables:
$ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
$ B: int 1 2 3 4 5
$ C: logi TRUE TRUE FALSE FALSE TRUE
$ D: chr "a" "b" "c" "d" ...
$ E: chr "A a" "B b" "C c" "D d" ...
Run Code Online (Sandbox Code Playgroud)
让我们也提供其他解决方案:
带基础包:
df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)],
as.factor)
Run Code Online (Sandbox Code Playgroud)
使用dplyr1.0.0
df <- df%>%mutate(across(where(is.factor), as.character))
Run Code Online (Sandbox Code Playgroud)
带purrr包:
library(purrr)
df <- df%>% modify_if(is.factor, as.character)
Run Code Online (Sandbox Code Playgroud)
小智 5
最简单的方法是使用下面给出的代码。它会自动完成将所有变量转换为 R 中数据帧中的因子的整个过程。它对我来说非常好。这里的 food_cat 是我正在使用的数据集。将其更改为您正在处理的那个。
for(i in 1:ncol(food_cat)){
food_cat[,i] <- as.factor(food_cat[,i])
}
Run Code Online (Sandbox Code Playgroud)