我得到了几个 CSV 文件,其中包含当地德语风格的数字,即以逗号作为小数分隔符,以点作为千位分隔符,例如 10.380,45。CSV 文件中的值以“;”分隔。这些文件还包含字符、日期、日期和时间以及逻辑类的列。
read.table 函数的问题是,您可以使用 dec="," 指定小数点分隔符,但不能指定千点分隔符。(如果我错了,请指正)
我知道预处理是一种解决方法,但我想以一种其他人可以在没有我的情况下使用它的方式编写我的代码。
我找到了一种通过设置自己的类,以我想要的方式使用 read.csv2 读取 CSV 文件的方法,如以下示例所示。基于在 R 中以点作为千位分隔符加载 csv 的最优雅方法
# Create test example
df_test_write <- cbind.data.frame(c("a","b","c","d","e","f","g","h","i","j",rep("k",times=200)),
c("5.200,39","250,36","1.000.258,25","3,58","5,55","10.550,00","10.333,00","80,33","20.500.000,00","10,00",rep("3.133,33",times=200)),
c("25.03.2015","28.04.2015","03.05.2016","08.08.2016","08.08.2016","08.08.2016","08.08.2016","08.08.2016","08.08.2016","08.08.2016",rep("08.08.2016",times=200)),
stringsAsFactors=FALSE)
colnames(df_test_write) <- c("col_text","col_num","col_date")
# write test csv
write.csv2(df_test_write,file="Test.csv",quote=FALSE,row.names=FALSE)
#### read with read.csv2 ####
# First, define your own class
#define your own numeric class
setClass('myNum')
#define conversion
setAs("character","myNum", function(from) as.numeric(gsub(",","\\.",gsub("\\.","",from))))
# own date class
library(lubridate)
setClass('myDate')
setAs("character","myDate",function(from) dmy(from))
# Read the csv file, in colClasses the columns class can be …Run Code Online (Sandbox Code Playgroud)