Nel*_*ewR 5 numbers text-extraction r
我正在尝试从字符串中提取拼写出来的数字,并提取数字后面的单词。我设法通过一种费力的方式编写自己的代码来做到这一点,包括要搜索的拼写数字(这里是一个示例stringr::sentences:
numbers <- str_c(c(" one ", " two ", " three ", " four ", " five ", " six ", " seven ", " eight "," nine ", " ten "), "([^ ]+)")
number_match <- str_c(numbers, collapse = "|")
reduced <- sentences %>%
str_detect(number_match)
sent <- sentences[reduced==TRUE]
str_extract(sent, number_match)
Run Code Online (Sandbox Code Playgroud)
这些是提取的字符串:
[1] " seven books" " two met" " two factors" " three lists" " seven is" " two when" " ten inches." " one war"
[9] " one button" " six minutes." " ten years" " two shares" " two distinct" " five cents" " two pins" " five robins."
[17] " four kinds" " three story" " three inches" " six comes" " three batches" " two leaves."
Run Code Online (Sandbox Code Playgroud)
由于我不可能预先知道是否考虑了所有可能的数字,因此我想知道 R 是否提供了可以识别拼写出来的数字的工具?我发现了类似的问题,例如将拼写出来的数字转换为数字 ,但不幸的是这不是关于 R 的问题。
任何帮助表示赞赏。