R - 将带有 $ 和 % 符号的字符列转换为数字

Sop*_*010 1 r numeric character dataframe

我有一个包含多列的数据框df,我想清理其中的一些定价列。数据框如下所示:

Col1(char)  Col2(char)     Col3(char)     Col4(char)
CST         $ 128,412.00   $ 0.034        +149.628%
FSD         $ 138,232.40   $ 0.023        +124.244%
SDD         $ 112,234.45   $ 0.023        -123.324%
Run Code Online (Sandbox Code Playgroud)

但是,我希望输出如下所示:

Col1(char)  Col2(num)   Col3(num)  Col4(num)
CST         128412.00   0.034      1.49628
FSD         138232.40   0.023      1.24244
SDD         112234.45   0.023      -1.23324
Run Code Online (Sandbox Code Playgroud)

如何尽可能优雅地将 Col2 - Col4 转换为数字列?谢谢你!

李哲源*_*李哲源 5

dat <- structure(list(Col1 = c("CST", "FSD", "SDD"), Col2 = c("$ 128,412.00", \n"$ 138,232.40", "$ 112,234.45"), Col3 = c("$ 0.034", "$ 0.023", \n"$ 0.023"), Col4 = c("+149.628%", "+124.244%", "-123.324%")),\n class = "data.frame", row.names = c(NA, -3L))\n#  Col1         Col2    Col3      Col4\n#1  CST $ 128,412.00 $ 0.034 +149.628%\n#2  FSD $ 138,232.40 $ 0.023 +124.244%\n#3  SDD $ 112,234.45 $ 0.023 -123.324%\n
Run Code Online (Sandbox Code Playgroud)\n

要将除第 1 列之外的所有列转换为数字,您可以执行以下操作

\n
tonum <- function (x) {\n  ## delete "$", "," and "%" and convert string to numeric\n  num <- as.numeric(gsub("[$,%]", "", x))\n  ## watch out for "%", that is, 90% should be 90 / 100 = 0.9\n  if (grepl("%", x[1])) num <- num / 100\n  ## return\n  num\n}\n\ndat[-1] <- lapply(dat[-1], tonum)\ndat\n#  Col1     Col2  Col3     Col4\n#1  CST 128412.0 0.034  1.49628\n#2  FSD 138232.4 0.023  1.24244\n#3  SDD 112234.4 0.023 -1.23324\n
Run Code Online (Sandbox Code Playgroud)\n
\n

评论:

\n

readr::parse_number()我刚刚从PaulS的回答中得知。这是一个有趣的函数。基本上它会删除所有不能成为数字有效部分的内容。作为实践,我使用 REGEX 实现相同的逻辑。所以这是一个通用的tonum().

\n
tonum <- function (x, regex = TRUE) {\n  ## drop everything that is not "+/-", "0-9" or "."\n  ## then convert string to numeric\n  if (regex) {\n    num <- as.numeric(stringr::str_remove_all(x, "[^+\\\\-0-9\\\\.]*"))\n  } else {\n    num <- readr::parse_number(x)\n  }\n  ## watch out for "%", that is, 90% should be 90 / 100 = 0.9\n  ind <- grepl("%", x)\n  num[ind] <- num[ind] / 100\n  ## return\n  num\n}\n
Run Code Online (Sandbox Code Playgroud)\n

这是一个快速测试:

\n
x <- unlist(dat[-1], use.names = FALSE)\nx <- c(x, "euro 300.95", "RMB 888.66", "\xc2\xa31999.98")\n# [1] "$ 128,412.00" "$ 138,232.40" "$ 112,234.45" "$ 0.034"      "$ 0.023"     \n# [6] "$ 0.023"      "+149.628%"    "+124.244%"    "-123.324%"    "euro 300.95" \n#[11] "RMB 888.66"   "\xc2\xa31999.98"  \n\ntonum(x, regex = TRUE)\n# [1] 128412.00000 138232.40000 112234.45000      0.03400      0.02300\n# [6]      0.02300      1.49628      1.24244     -1.23324    300.95000\n#[11]    888.66000   1999.98000\n\ntonum(x, regex = FALSE)\n# [1] 128412.00000 138232.40000 112234.45000      0.03400      0.02300\n# [6]      0.02300      1.49628      1.24244     -1.23324    300.95000\n#[11]    888.66000   1999.98000\n
Run Code Online (Sandbox Code Playgroud)\n