我有一个数据框,我想清理包含字符形式的价格的列。我想删除 $ 符号和,分隔符并将这些列作为数字。
structure(list(Sold.Price = c("", "$177,500", "$180,000", "$180,000",
"$189,000"), Title.to.Land = c("Freehold Strata", "Freehold Strata",
"Freehold Strata", "Freehold Strata", "Freehold Strata"), Price = c("$174,900",
"$177,500", "$180,000", "$180,000", "$189,000"), DOM = c(93L,
34L, 39L, 56L, 2L), List.Date = c("10/4/2019", "12/12/2019",
"12/9/2019", "11/12/2019", "1/9/2020"), MaintFee = c("$2,916.00",
"$373.80", "$331.57", "$320.42", "$1,055.67")), row.names = c(NA,
5L), class = "data.frame")
Sold.Price Title.to.Land Price DOM List.Date MaintFee
1 Freehold Strata $174,900 93 10/4/2019 $2,916.00
2 $177,500 Freehold Strata $177,500 34 12/12/2019 $373.80
3 $180,000 Freehold Strata $180,000 39 12/9/2019 $331.57
4 $180,000 Freehold Strata $180,000 56 11/12/2019 $320.42
5 $189,000 Freehold Strata $189,000 2 1/9/2020 $1,055.67
Run Code Online (Sandbox Code Playgroud)
我试过这个方法:
combined_csv$Sold.Price <- gsub("\\$", "", combined_csv$Sold.Price)
combined_csv$Sold.Price <- gsub("\\,", "", combined_csv$Sold.Price)
combined_csv$Sold.Price <- as.numeric(combined_csv$Sold.Price)
Run Code Online (Sandbox Code Playgroud)
但这看起来并不聪明。我想为所有价格类型列(Sold.price、Maintfee 等)在一行中完成所有这些。我该怎么做?
我认为您可以使用以下解决方案:
library(dplyr)
df %>%
mutate(across(c(Sold.Price, Price, MaintFee), ~ as.numeric(gsub("[$,]", "", .x))))
Sold.Price Title.to.Land Price DOM List.Date MaintFee
1 NA Freehold Strata 174900 93 10/4/2019 2916.00
2 177500 Freehold Strata 177500 34 12/12/2019 373.80
3 180000 Freehold Strata 180000 39 12/9/2019 331.57
4 180000 Freehold Strata 180000 56 11/12/2019 320.42
5 189000 Freehold Strata 189000 2 1/9/2020 1055.67
Run Code Online (Sandbox Code Playgroud)
或者在基础 R 中我们可以这样做:
as.data.frame(sapply(df, function(x) {
if(any(grepl("\\$", x))) {
as.numeric(gsub("[$,]", "", x))
} else {
x
}
}))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
76 次 |
| 最近记录: |