ec0*_*cus 16 formatting r function
将数据框中的多个列从字符转换为数字格式的最有效方法是什么?
我有一个名为DF的数据框,包含所有字符变量.
我想做点什么
for (i in names(DF){
DF$i <- as.numeric(DF$i)
}
Run Code Online (Sandbox Code Playgroud)
谢谢
Luc*_*lia 40
你可以试试
DF <- data.frame("a" = as.character(0:5),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6],
stringsAsFactors = FALSE)
# Check columns classes
sapply(DF, class)
# a b c
# "character" "character" "character"
cols.num <- c("a","b")
DF[cols.num] <- sapply(DF[cols.num],as.numeric)
sapply(DF, class)
# a b c
# "numeric" "numeric" "character"
Run Code Online (Sandbox Code Playgroud)
etr*_*dge 18
使用 dplyr 1.0 中的 across() 函数
df <- df %>% mutate(across(, ~as.numeric(.))
Run Code Online (Sandbox Code Playgroud)
ARo*_*son 13
如果您已经在使用tidyverse,则有一些解决方案取决于实际情况:
library(dplyr)
library(magrittr)
# solution
dataset %<>% mutate_if(is.character,as.numeric)
# to test
df <- data.frame(
x1 = c('1','2','3'),
x2 = c('4','5','6'),
x3 = c('1','a','b'), # vector with alpha characters
stringsAsFactors = F)
# display starting structure
df %>% str()
Run Code Online (Sandbox Code Playgroud)
将所有字符向量转换为数字(如果不是数字则可能失败)
df %>%
select(-x3) %>% # this removes the alpha column if all your character columns need converted to numeric
mutate_if(is.character,as.numeric) %>%
str()
Run Code Online (Sandbox Code Playgroud)
检查每列是否可以转换。这可以是匿名函数。它检查as.numeric是否返回NA。它还检查是否是忽略因子的字符向量。由于您知道将故意引入NA并在以后进行检查,因此它也禁止显示警告。
numericcharacters <- function(x) {
!any(is.na(suppressWarnings(as.numeric(x)))) & is.character(x)
}
df %>%
mutate_if(numericcharacters,as.numeric) %>%
str()
Run Code Online (Sandbox Code Playgroud)
如果要转换特定的命名列,则mutate_at更好。
df %>% mutate_at('x1',as.numeric) %>% str()
Run Code Online (Sandbox Code Playgroud)
您可以使用 hablar 包中的 convert :
library(dplyr)
library(hablar)
# Sample df (stolen from the solution by Luca Braglia)
df <- tibble("a" = as.character(0:5),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6])
# insert variable names in num()
df %>% convert(num(a, b))
Run Code Online (Sandbox Code Playgroud)
这给了你:
# A tibble: 6 x 3
a b c
<dbl> <dbl> <chr>
1 0. 0.100 a
2 1. 1.10 b
3 2. 2.10 c
4 3. 3.10 d
5 4. 4.10 e
6 5. 5.10 f
Run Code Online (Sandbox Code Playgroud)
或者如果你很懒惰,让 hablar 的 retype() 猜测正确的数据类型:
df %>% retype()
Run Code Online (Sandbox Code Playgroud)
这给了你:
# A tibble: 6 x 3
a b c
<int> <dbl> <chr>
1 0 0.100 a
2 1 1.10 b
3 2 2.10 c
4 3 3.10 d
5 4 4.10 e
6 5 5.10 f
Run Code Online (Sandbox Code Playgroud)
小智 6
我使用此代码将除第一列之外的所有列转换为数字:
library(dplyr)
# check structure, row and column number with: glimpse(df)
# convert to numeric e.g. from 2nd column to 10th column
df <- df %>%
mutate_at(c(2:10), as.numeric)
Run Code Online (Sandbox Code Playgroud)
小智 5
对 ARobertson 和 Kenneth Wilson 对我有用的答案略有调整。
运行 R 3.6.0,在我的环境中使用 library(tidyverse) 和 library(dplyr):
library(tidyverse)
library(dplyr)
> df %<>% mutate_if(is.character, as.numeric)
Error in df %<>% mutate_if(is.character, as.numeric) :
could not find function "%<>%"
Run Code Online (Sandbox Code Playgroud)
我做了一些快速的研究,并在 Hadley 的“ The tidyverse style guide ”中找到了这个注释。
magrittr 包提供了 %<>% 运算符作为修改对象的快捷方式。避免使用此运算符。
Run Code Online (Sandbox Code Playgroud)# Good x <- x %>% abs() %>% sort() # Bad x %<>% abs() %>% sort()
解决方案
基于该风格指南:
df_clean <- df %>% mutate_if(is.character, as.numeric)
Run Code Online (Sandbox Code Playgroud)
工作示例
> df_clean <- df %>% mutate_if(is.character, as.numeric)
Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coercion
3: NAs introduced by coercion
4: NAs introduced by coercion
5: NAs introduced by coercion
6: NAs introduced by coercion
7: NAs introduced by coercion
8: NAs introduced by coercion
9: NAs introduced by coercion
10: NAs introduced by coercion
> df_clean
# A tibble: 3,599 x 17
stack datetime volume BQT90 DBT90 DRT90 DLT90 FBT90 RT90 HTML90 RFT90 RLPP90 RAT90 SRVR90 SSL90 TCP90 group
<dbl> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
68366 次 |
| 最近记录: |