为什么as.factor在内部使用时返回一个字符?

Tal*_*ili 13 r apply r-factor

我想使用apply()以下方法将变量转换为因子:

a <- data.frame(x1 = rnorm(100),
                x2 = sample(c("a","b"), 100, replace = T),
                x3 = factor(c(rep("a",50) , rep("b",50))))

a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)
Run Code Online (Sandbox Code Playgroud)

结果是:

         x1          x2          x3 
"character" "character" "character" 
Run Code Online (Sandbox Code Playgroud)

我不明白为什么这会导致字符向量而不是因子向量.

Mar*_*rek 30

apply将data.frame转换为字符矩阵.用途lapply:

lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
Run Code Online (Sandbox Code Playgroud)

在第二个命令中,应用将结果转换为字符矩阵,使用lapply:

a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
Run Code Online (Sandbox Code Playgroud)

但是对于简单的了望,您可以使用str:

str(a)
# 'data.frame':   100 obs. of  3 variables:
#  $ x1: num  -1.79 -1.091 1.307 1.142 -0.972 ...
#  $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
#  $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
Run Code Online (Sandbox Code Playgroud)

根据评论补充说明:

为什么申请时的劳拉不起作用?

首先要做的apply是将参数转换为矩阵.所以apply(a)相当于apply(as.matrix(a)).如你所见,str(as.matrix(a))给你:

chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:3] "x1" "x2" "x3"
Run Code Online (Sandbox Code Playgroud)

没有更多因素,所以class返回"character"所有列.
lapply适用于列,因此可以为您提供所需的功能(它可以class(a$column_name)为每列提供类似功能).

你可以在帮助中看到apply为什么applyas.factor不起作用:

在所有情况下,在设置维度之前,as.vector将结果强制转换为基本矢量类型之一,以便(例如)因子结果将被强制转换为字符数组.

为什么sapplyas.factor不起作用你可以看到帮助sapply:

值(...)与X(...)长度相同的原子向量或矩阵或列表如果发生简化,则输出类型由层次结构中返回值的最高类型确定NULL <raw <logical <在将pairlists强制转换为列表之后,整数<real <complex <complex <character <list <expression.

你永远不会得到因子或data.frame的矩阵.

如何将输出转换为data.frame

简单,as.data.frame按照您在评论中的说法使用:

a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame':   100 obs. of  3 variables:
 $ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
 $ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
 $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
Run Code Online (Sandbox Code Playgroud)

但是如果你想用factor一个技巧替换所选的字符列:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: chr  "a" "b" "c" "d" ...
 $ x2: chr  "A" "B" "C" "D" ...
 $ x3: chr  "A" "B" "C" "D" ...

columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: chr  "A" "B" "C" "D" ...
Run Code Online (Sandbox Code Playgroud)

您可以使用它来替换所有列:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
Run Code Online (Sandbox Code Playgroud)

  • 是的...如果你想`data.frame`使用`as.data.frame(lapply(dtf,fun))`.`sapply`将与`apply`做同样的事情.不知道为什么,但也许它与`data.frame`实际上是一个列表的事实有关...`lapply`返回`list`,所以如果你做的话,它很容易转换为`data.frame`在'sapply`或`apply`输出中,你试图强迫`numeric`到`data.frame`,因此弄乱了......这很奇怪,但不是"不可预见的"行为,我必须承认! (2认同)