一次将多个列强制转换为因子

wsd*_*sda 58 r dataframe r-factor

我有一个如下所示的示例数据框:

data <- data.frame(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10])))
Run Code Online (Sandbox Code Playgroud)

我想知道如何选择多个列并将它们一起转换为因子.我通常会这样做data$A = as.factor(data$A).但是当数据框非常大并且包含大量列时,这种方式将非常耗时.有谁知道更好的方法吗?

Ric*_*ven 97

选择一些列来强制使用因子:

cols <- c("A", "C", "D", "H")
Run Code Online (Sandbox Code Playgroud)

使用lapply()胁迫和更换所选列:

data[cols] <- lapply(data[cols], factor)  ## as.factor() could also be used
Run Code Online (Sandbox Code Playgroud)

检查结果:

sapply(data, class)
#        A         B         C         D         E         F         G 
# "factor" "integer"  "factor"  "factor" "integer" "integer" "integer" 
#        H         I         J 
# "factor" "integer" "integer" 
Run Code Online (Sandbox Code Playgroud)

  • @ Tgsmith61591-它可能是.逗号是矩阵类型的子集,而逗号是列表子集.数据框可以由任一方进行子集化,因此任何一种方式都可以. (4认同)
  • @Ben您可以通过扩展答案来指定标签和级别: `data[cols] &lt;- lapply(data[cols], Factor,levels=c("val1", "val2", ...), labels=c( “label1”,“label2”,...)` 请注意这一点...所有变量都将使用您提供的相同级别和标签。 (2认同)

akr*_*run 29

这是一个使用选项dplyr.在%<>%从操作者magrittr更新LHS与所得到的值的对象.

library(magrittr)
library(dplyr)
cols <- c("A", "C", "D", "H")

data %<>%
       mutate_each_(funs(factor(.)),cols)
str(data)
#'data.frame':  4 obs. of  10 variables:
# $ A: Factor w/ 4 levels "23","24","26",..: 1 2 3 4
# $ B: int  15 13 39 16
# $ C: Factor w/ 4 levels "3","5","18","37": 2 1 3 4
# $ D: Factor w/ 4 levels "2","6","28","38": 3 1 4 2
# $ E: int  14 4 22 20
# $ F: int  7 19 36 27
# $ G: int  35 40 21 10
# $ H: Factor w/ 4 levels "11","29","32",..: 1 4 3 2
# $ I: int  17 1 9 25
# $ J: int  12 30 8 33
Run Code Online (Sandbox Code Playgroud)

或者如果我们正在使用data.table,请使用for循环set

setDT(data)
for(j in cols){
  set(data, i=NULL, j=j, value=factor(data[[j]]))
}
Run Code Online (Sandbox Code Playgroud)

或者我们可以指定'cols' .SDcols 并将:=rhs 指定给'cols'

setDT(data)[, (cols):= lapply(.SD, factor), .SDcols=cols]
Run Code Online (Sandbox Code Playgroud)


Gue*_*sBF 23

截至 2021 年(2023 年初仍然有效),当前的tidyverse/dplyr方法是使用across, 和一个<tidy-select>声明。

library(dplyr)

data %>% mutate(across(*<tidy-select>*, *function*))
Run Code Online (Sandbox Code Playgroud)

across(<tidy-select>)允许非常一致且轻松地选择要转换的列。一些例子:

data %>% mutate(across(c(A, B, C, E), as.factor)) # select columns A to C, and E (by name)

data %>% mutate(across(where(is.character), as.factor)) # select character columns

data %>% mutate(across(1:5, as.factor)) # select first 5 columns (by index)
Run Code Online (Sandbox Code Playgroud)

  • https://dplyr.tidyverse.org/reference/across.html “across() 取代了 summarise_at()、summarise_if() 和 summarise_all() 等“范围变体”系列。” (2认同)

小智 21

最近的tidyverse方法是使用该mutate_at功能:

library(tidyverse)
library(magrittr)
set.seed(88)

data <- data.frame(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10])))
cols <- c("A", "C", "D", "H")

data %<>% mutate_at(cols, funs(factor(.)))
str(data)
 $ A: Factor w/ 4 levels "5","17","18",..: 2 1 4 3   
 $ B: int  36 35 2 26
 $ C: Factor w/ 4 levels "22","31","32",..: 1 2 4 3
 $ D: Factor w/ 4 levels "1","9","16","39": 3 4 1 2
 $ E: int  3 14 30 38
 $ F: int  27 15 28 37
 $ G: int  19 11 6 21
 $ H: Factor w/ 4 levels "7","12","20",..: 1 3 4 2
 $ I: int  23 24 13 8
 $ J: int  10 25 4 33
Run Code Online (Sandbox Code Playgroud)

  • 如果你只进行一次转换,你甚至不需要使用`funs`; `mutate_at(cols,factor)`就足够了. (5认同)

nev*_*ves 9

您可以使用mutate_if( dplyr):

例如,强制integer输入factor

mydata=structure(list(a = 1:10, b = 1:10, c = c("a", "a", "b", "b", 
"c", "c", "c", "c", "c", "c")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

# A tibble: 10 x 3
       a     b c    
   <int> <int> <chr>
 1     1     1 a    
 2     2     2 a    
 3     3     3 b    
 4     4     4 b    
 5     5     5 c    
 6     6     6 c    
 7     7     7 c    
 8     8     8 c    
 9     9     9 c    
10    10    10 c   
Run Code Online (Sandbox Code Playgroud)

使用函数:

library(dplyr)

mydata%>%
    mutate_if(is.integer,as.factor)

# A tibble: 10 x 3
       a     b c    
   <fct> <fct> <chr>
 1     1     1 a    
 2     2     2 a    
 3     3     3 b    
 4     4     4 b    
 5     5     5 c    
 6     6     6 c    
 7     7     7 c    
 8     8     8 c    
 9     9     9 c    
10    10    10 c    
Run Code Online (Sandbox Code Playgroud)


小智 6

并且,为了完整性并且关于这个询问仅改变字符串列的问题,有mutate_if:

data <- cbind(stringVar = sample(c("foo","bar"),10,replace=TRUE),
              data.frame(matrix(sample(1:40), 10, 10, dimnames = list(1:10, LETTERS[1:10]))),stringsAsFactors=FALSE)     

factoredData = data %>% mutate_if(is.character,funs(factor(.)))
Run Code Online (Sandbox Code Playgroud)