我的数据看起来像这样:
412 U CA, Riverside
413 U British Columbia
414 CREI
415 U Pompeu Fabra
416 Office of the Comptroller of the Currency, US Department of the Treasury
417 Bureau of Economics, US Federal Trade Commission
418 U Carlos III de Madrid
419 U Brescia
420 LUISS Guido Carli
421 U Alicante
422 Harvard Society of Fellows
423 Toulouse School of Economics
424 Decision Economics Inc, Boston, MA
425 ECARES, Free U Brussels
Run Code Online (Sandbox Code Playgroud)
我将需要对这些数据进行地理编码,以获取每个特定机构的坐标。为了做到这一点,我需要阐明所有州名。同时,我不希望将“ ECARES”之类的缩写转换为“ ECaliforniaRES”。
我一直在想将state.abb和state.name向量转换为正则表达式的向量,所以state.abb看起来像这样(使用阿拉巴马州和加利福尼亚州作为状态1和状态2):
c("^AL "|" AL "|" AL,"|",AL "| " AL$", "^CA "[....])
Run Code Online (Sandbox Code Playgroud)
而且state.name向量是这样的:
c("^Alabama "|" Alabama "|" Alabama,"|",Alabama "| " Alabama$", "^California "[....])
Run Code Online (Sandbox Code Playgroud)
希望我可以使用mgsub函数将修改后的state.abb向量中的所有表达式替换为修改后的state.name向量中的相应条目。
但是由于某种原因,似乎无法将正则表达式放入向量中:
test<-c(^AL, ^AB)
Error: unexpected '^' in "test<-c(^"
Run Code Online (Sandbox Code Playgroud)
我曾尝试为“ ^”号找借口,但这似乎并没有奏效:
test<-c(\^AL, \^AB)
Error: unexpected input in "test<-c(\"
> test<-c(\\^AL, \\^AB)
Run Code Online (Sandbox Code Playgroud)
有没有办法将正则表达式放入向量中,还是有另一种实现我的目标的方法(即将所有两个字母的州缩写替换为州名称,而又不弄乱过程中的其他首字母缩写)?
我的数据摘录:
c("U Lausanne", "Swiss Finance Institute", "U CA, Riverside",
"U British Columbia", "CREI", "U Pompeu Fabra", "Office of the Comptroller of the Currency, US Department of the Treasury",
"Bureau of Economics, US Federal Trade Commission", "U Carlos III de Madrid",
"U Brescia", "LUISS Guido Carli", "U Alicante", "Harvard Society of Fellows",
"Toulouse School of Economics", "Decision Economics Inc, Boston, MA",
"ECARES, Free U Brussels", "Baylor U", "Research Centre for Education",
"the Labour Market, Maastricht U", "U Bonn", "Swarthmore College"
)
Run Code Online (Sandbox Code Playgroud)
我们可以利用的state.abb vector,并paste通过它一起collapse与荷兰国际集团|
pat1 <- paste0("\\b(", paste(state.abb, collapse="|"), ")\\b")
Run Code Online (Sandbox Code Playgroud)
该\\b字的边界,以便能够避免滥匹配“SAL”表示
并且类似地state.name,paste在^与$分别为前缀/后缀来标记的开始字符串的端
pat2 <- paste0("^(", paste(state.name, collapse="|"), ")$")
Run Code Online (Sandbox Code Playgroud)