有没有人知道将数字的文本表示转换为实际数字的函数,例如20305年的"二万三千五".我在数据帧行中编写了数字,并希望将它们转换为数字.
在包qdap中,您可以用单词替换数字表示的数字(例如,1001变为一千),但不是相反:
library(qdap)
replace_number("I like 346457 ice cream cones.")
[1] "I like three hundred forty six thousand four hundred fifty seven ice cream cones."
Run Code Online (Sandbox Code Playgroud)
Tho*_*mas 14
这是一个应该让你成千上万的开始.
word2num <- function(word){
wsplit <- strsplit(tolower(word)," ")[[1]]
one_digits <- list(zero=0, one=1, two=2, three=3, four=4, five=5,
six=6, seven=7, eight=8, nine=9)
teens <- list(eleven=11, twelve=12, thirteen=13, fourteen=14, fifteen=15,
sixteen=16, seventeen=17, eighteen=18, nineteen=19)
ten_digits <- list(ten=10, twenty=20, thirty=30, forty=40, fifty=50,
sixty=60, seventy=70, eighty=80, ninety=90)
doubles <- c(teens,ten_digits)
out <- 0
i <- 1
while(i <= length(wsplit)){
j <- 1
if(i==1 && wsplit[i]=="hundred")
temp <- 100
else if(i==1 && wsplit[i]=="thousand")
temp <- 1000
else if(wsplit[i] %in% names(one_digits))
temp <- as.numeric(one_digits[wsplit[i]])
else if(wsplit[i] %in% names(teens))
temp <- as.numeric(teens[wsplit[i]])
else if(wsplit[i] %in% names(ten_digits))
temp <- (as.numeric(ten_digits[wsplit[i]]))
if(i < length(wsplit) && wsplit[i+1]=="hundred"){
if(i>1 && wsplit[i-1] %in% c("hundred","thousand"))
out <- out + 100*temp
else
out <- 100*(out + temp)
j <- 2
}
else if(i < length(wsplit) && wsplit[i+1]=="thousand"){
if(i>1 && wsplit[i-1] %in% c("hundred","thousand"))
out <- out + 1000*temp
else
out <- 1000*(out + temp)
j <- 2
}
else if(i < length(wsplit) && wsplit[i+1] %in% names(doubles)){
temp <- temp*100
out <- out + temp
}
else{
out <- out + temp
}
i <- i + j
}
return(list(word,out))
}
Run Code Online (Sandbox Code Playgroud)
结果:
> word2num("fifty seven")
[[1]]
[1] "fifty seven"
[[2]]
[1] 57
> word2num("four fifty seven")
[[1]]
[1] "four fifty seven"
[[2]]
[1] 457
> word2num("six thousand four fifty seven")
[[1]]
[1] "six thousand four fifty seven"
[[2]]
[1] 6457
> word2num("forty six thousand four fifty seven")
[[1]]
[1] "forty six thousand four fifty seven"
[[2]]
[1] 46457
> word2num("forty six thousand four hundred fifty seven")
[[1]]
[1] "forty six thousand four hundred fifty seven"
[[2]]
[1] 46457
> word2num("three forty six thousand four hundred fifty seven")
[[1]]
[1] "three forty six thousand four hundred fifty seven"
[[2]]
[1] 346457
Run Code Online (Sandbox Code Playgroud)
我已经可以告诉你,这不会起作用word2num("four hundred thousand fifty")
,因为它不知道如何处理连续的"百"和"千"项,但算法可能会被修改.任何人都可以随意编辑它,如果他们有改进或在他们自己的答案中建立它们.我只是觉得这是一个有趣的问题(一段时间).
编辑:显然Bill Venables有一个名为english的软件包可能比上面的代码更好.