将写入的数字转换为R中的数字

Hen*_*enk 17 text r qdap

有没有人知道将数字的文本表示转换为实际数字的函数,例如20305年的"二万三千五".我在数据帧行中编写了数字,并希望将它们转换为数字.

在包qdap中,您可以用单词替换数字表示的数字(例如,1001变为一千),但不是相反:

library(qdap)
replace_number("I like 346457 ice cream cones.")
[1] "I like three hundred forty six thousand four hundred fifty seven ice cream cones."
Run Code Online (Sandbox Code Playgroud)

Tho*_*mas 14

这是一个应该让你成千上万的开始.

word2num <- function(word){
    wsplit <- strsplit(tolower(word)," ")[[1]]
    one_digits <- list(zero=0, one=1, two=2, three=3, four=4, five=5,
                       six=6, seven=7, eight=8, nine=9)
    teens <- list(eleven=11, twelve=12, thirteen=13, fourteen=14, fifteen=15,
                  sixteen=16, seventeen=17, eighteen=18, nineteen=19)
    ten_digits <- list(ten=10, twenty=20, thirty=30, forty=40, fifty=50,
                       sixty=60, seventy=70, eighty=80, ninety=90)
    doubles <- c(teens,ten_digits)
    out <- 0
    i <- 1
    while(i <= length(wsplit)){
        j <- 1
        if(i==1 && wsplit[i]=="hundred")
            temp <- 100
        else if(i==1 && wsplit[i]=="thousand")
            temp <- 1000
        else if(wsplit[i] %in% names(one_digits))
            temp <- as.numeric(one_digits[wsplit[i]])
        else if(wsplit[i] %in% names(teens))
            temp <- as.numeric(teens[wsplit[i]])
        else if(wsplit[i] %in% names(ten_digits))
            temp <- (as.numeric(ten_digits[wsplit[i]]))
        if(i < length(wsplit) && wsplit[i+1]=="hundred"){
            if(i>1 && wsplit[i-1] %in% c("hundred","thousand"))
                out <- out + 100*temp
            else
                out <- 100*(out + temp)
            j <- 2
        }
        else if(i < length(wsplit) && wsplit[i+1]=="thousand"){
            if(i>1 && wsplit[i-1] %in% c("hundred","thousand"))
                out <- out + 1000*temp
            else
                out <- 1000*(out + temp)
            j <- 2
        }
        else if(i < length(wsplit) && wsplit[i+1] %in% names(doubles)){
            temp <- temp*100
            out <- out + temp
        }
        else{
            out <- out + temp
        }
        i <- i + j
    }
    return(list(word,out))
}
Run Code Online (Sandbox Code Playgroud)

结果:

> word2num("fifty seven")
[[1]]
[1] "fifty seven"

[[2]]
[1] 57

> word2num("four fifty seven")
[[1]]
[1] "four fifty seven"

[[2]]
[1] 457

> word2num("six thousand four fifty seven")
[[1]]
[1] "six thousand four fifty seven"

[[2]]
[1] 6457

> word2num("forty six thousand four fifty seven")
[[1]]
[1] "forty six thousand four fifty seven"

[[2]]
[1] 46457

> word2num("forty six thousand four hundred fifty seven")
[[1]]
[1] "forty six thousand four hundred fifty seven"

[[2]]
[1] 46457

> word2num("three forty six thousand four hundred fifty seven")
[[1]]
[1] "three forty six thousand four hundred fifty seven"

[[2]]
[1] 346457
Run Code Online (Sandbox Code Playgroud)

我已经可以告诉你,这不会起作用word2num("four hundred thousand fifty"),因为它不知道如何处理连续的"百"和"千"项,但算法可能会被修改.任何人都可以随意编辑它,如果他们有改进或在他们自己的答案中建立它们.我只是觉得这是一个有趣的问题(一段时间).

编辑:显然Bill Venables有一个名为english的软件包可能比上面的代码更好.

  • 试图看看英语包在哪里可以做到这一点。它似乎只是相反,但也许我错过了这个? (2认同)