R:字符串中的缩写状态名称

Ale*_*tov 2 regex r abbreviation

我有字符串,其中包含州名.我如何有效地缩写它们?我知道state.abb[grep("New York", state.name)]但只有"纽约"是整个字符串才有效.例如,我有"纽约的沃尔玛".提前致谢!

我们假设这个输入:

x = c("Walmart, New York", "Hobby Lobby (California)", "Sold in Sears in Illinois")
Run Code Online (Sandbox Code Playgroud)

编辑:所需的输出将是la"Walmart,NY","Hobby Lobby(CA)","在IL的西尔斯出售".从这里可以看出,状态可以在字符串中以多种方式出现

Jos*_*ien 5

这里有一个基础R的方式,使用gregexpr(),regmatches()以及regmatches<-(),:

abbreviateStateNames <- function(x) {
    pat <- paste(state.name, collapse="|")
    m <- gregexpr(pat, x)
    ff <- function(x) state.abb[match(x, state.name)]
    regmatches(x, m) <- lapply(regmatches(x, m), ff)
    x
}

x <- c("Hobby Lobby (California)", 
       "Hello New York City, here I come (from Greensboro North Carolina)!")

abbreviateStateNames(x)
# [1] "Hobby Lobby (CA)"                                
# [2] "Hello NY City, here I come (from Greensboro NC)!"
Run Code Online (Sandbox Code Playgroud)

或者 - 更自然地 - 您可以使用gsubfn包完成相同的事情:

library(gsubfn)

pat <- paste(state.name, collapse="|")
gsubfn(pat, function(x) state.abb[match(x, state.name)], x)
[1] "Hobby Lobby (CA)"                                
[2] "Hello NY City, here I come (from Greensboro NC)!"
Run Code Online (Sandbox Code Playgroud)

  • 你每天都看不到`regmatches <-`.好一个 (2认同)