删除千位分隔符

spe*_*ndo 4 excel r decimal apply separator

我导入了一个Excel文件并获得了这样的数据框

structure(list(A = structure(1:3, .Label = c("1.100", "2.300", 
"5.400"), class = "factor"), B = structure(c(3L, 2L, 1L), .Label = c("1.000.000", 
"500", "7.800"), class = "factor"), C = structure(1:3, .Label = c("200", 
"3.100", "4.500"), class = "factor")), .Names = c("A", "B", "C"
), row.names = c(NA, -3L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

我现在想这些转换charsnumeric甚至integer.但是,点字符(.)不是小数点,而是"千位分隔符"(它是德语).

如何正确转换数据框?

我试过这个:

df2 <- as.data.frame(apply(df1, 2, gsub, pattern = "([0-9])\\.([0-9])", replacement= "\\1\\2"))

df3 <- as.data.frame(data.matrix(df2))
Run Code Online (Sandbox Code Playgroud)

但是,apply似乎将每列转换为一系列因子.我可以阻止apply这样做吗?

jub*_*uba 7

你可以用这个:

sapply(df, function(v) {as.numeric(gsub("\\.","", as.character(v)))})
Run Code Online (Sandbox Code Playgroud)

这使 :

        A       B    C
[1,] 1100    7800  200
[2,] 2300     500 3100
[3,] 5400 1000000 4500
Run Code Online (Sandbox Code Playgroud)

这将为您提供一个matrix 对象,但data.frame()如果您愿意,可以将其包装.

请注意,原始数据中的列不是字符,而是因子.


编辑:或者,data.frame()您可以执行此操作直接获取结果,而不是将其包装data.frame:

# the as.character(.) is just in case it's loaded as a factor
df[] <- lapply(df, function(x) as.numeric(gsub("\\.", "", as.character(x))))
Run Code Online (Sandbox Code Playgroud)


spe*_*ndo 2

我想我刚刚找到了另一个解决方案:

有必要使用stringsAsFactors = FALSE.

像这样:

df2 <- as.data.frame(apply(df1, 2, gsub, pattern = "([0-9])\\.([0-9])", replacement= "\\1\\2"), stringsAsFactors = FALSE)

df3 <- as.data.frame(data.matrix(df2))
Run Code Online (Sandbox Code Playgroud)