R:如何使用Data Science Toolbox对一个简单的地址进行GeoCode

Jos*_*e R 13 maps geocoding r

我厌倦了谷歌的地理编码,并决定尝试替代方案.Data Science Toolkit(http://www.datasciencetoolkit.org)允许您对无限数量的地址进行地理编码.R有一个很好的包,可以作为其功能的包装(CRAN:RDSTK).该软件包具有一个street2coordinates()与Data Science Toolkit的地理编码实用程序接口的函数.

但是,street2coordinates()如果您尝试对像City,Country这样简单的地理编码,则RDSTK功能不起作用.在下面的例子中,我将尝试使用该函数来获取凤凰城的纬度和经度:

> require("RDSTK")
> street2coordinates("Phoenix+Arizona+United+States")
[1] full.address
<0 rows> (or 0-length row.names)
Run Code Online (Sandbox Code Playgroud)

数据科学工具包的实用工具非常有效.这是提供答案的URL请求:http://www.datasciencetoolkit.org/maps/api/geocode/json?seamor = false& address = Phoenix + Arizona + UniteditedStates

我感兴趣的是对多个地址进行地理编码(完整的地址和城市名称).我知道Data Science Toolkit URL可以很好地工作.如何与URL连接并将多个纬度和经度带入具有地址的数据框?

这是一个示例数据集:

dff <- data.frame(address=c(
  "Birmingham, Alabama, United States",
  "Mobile, Alabama, United States",
  "Phoenix, Arizona, United States",
  "Tucson, Arizona, United States",
  "Little Rock, Arkansas, United States",
  "Berkeley, California, United States",
  "Duarte, California, United States",
  "Encinitas, California, United States",
  "La Jolla, California, United States",
  "Los Angeles, California, United States",
  "Orange, California, United States",
  "Redwood City, California, United States",
  "Sacramento, California, United States",
  "San Francisco, California, United States",
  "Stanford, California, United States",
  "Hartford, Connecticut, United States",
  "New Haven, Connecticut, United States"
  ))
Run Code Online (Sandbox Code Playgroud)

jlh*_*ard 15

像这样?

library(httr)
library(rjson)

data <- paste0("[",paste(paste0("\"",dff$address,"\""),collapse=","),"]")
url  <- "http://www.datasciencetoolkit.org/street2coordinates"
response <- POST(url,body=data)
json     <- fromJSON(content(response,type="text"))
geocode  <- do.call(rbind,sapply(json,
                                 function(x) c(long=x$longitude,lat=x$latitude)))
geocode
#                                                long      lat
# San Francisco, California, United States -117.88536 35.18713
# Mobile, Alabama, United States            -88.10318 30.70114
# La Jolla, California, United States      -117.87645 33.85751
# Duarte, California, United States        -118.29866 33.78659
# Little Rock, Arkansas, United States      -91.20736 33.60892
# Tucson, Arizona, United States           -110.97087 32.21798
# Redwood City, California, United States  -117.88536 35.18713
# New Haven, Connecticut, United States     -72.92751 41.36571
# Berkeley, California, United States      -122.29673 37.86058
# Hartford, Connecticut, United States      -72.76356 41.78516
# Sacramento, California, United States    -121.55541 38.38046
# Encinitas, California, United States     -116.84605 33.01693
# Birmingham, Alabama, United States        -86.80190 33.45641
# Stanford, California, United States      -122.16750 37.42509
# Orange, California, United States        -117.85311 33.78780
# Los Angeles, California, United States   -117.88536 35.18713
Run Code Online (Sandbox Code Playgroud)

这利用了street2coordinates API的POST接口(此处记录),它返回1个请求中的所有结果,而不是使用多个GET请求.

编辑(回应OP的评论)

Phoenix的缺席似乎是street2coordinates API中的一个错误.如果你去API演示页面并尝试"凤凰城,亚利桑那州,美国",你会得到一个空响应.但是,正如您的示例所示,使用他们的"Google风格地理编码器" 确实为Phoenix提供了结果.所以这是使用重复GET请求的解决方案.请注意,这运行得慢得多.

geo.dsk <- function(addr){ # single address geocode with data sciences toolkit
  require(httr)
  require(rjson)
  url      <- "http://www.datasciencetoolkit.org/maps/api/geocode/json"
  response <- GET(url,query=list(sensor="FALSE",address=addr))
  json <- fromJSON(content(response,type="text"))
  loc  <- json['results'][[1]][[1]]$geometry$location
  return(c(address=addr,long=loc$lng, lat= loc$lat))
}
result <- do.call(rbind,lapply(as.character(dff$address),geo.dsk))
result <- data.frame(result)
result
#                                     address         long        lat
# 1        Birmingham, Alabama, United States   -86.801904  33.456412
# 2            Mobile, Alabama, United States   -88.103184  30.701142
# 3           Phoenix, Arizona, United States -112.0733333 33.4483333
# 4            Tucson, Arizona, United States  -110.970869  32.217975
# 5      Little Rock, Arkansas, United States   -91.207356  33.608922
# 6       Berkeley, California, United States   -122.29673  37.860576
# 7         Duarte, California, United States  -118.298662  33.786594
# 8      Encinitas, California, United States  -116.846046  33.016928
# 9       La Jolla, California, United States  -117.876447  33.857515
# 10   Los Angeles, California, United States  -117.885359  35.187133
# 11        Orange, California, United States  -117.853112  33.787795
# 12  Redwood City, California, United States  -117.885359  35.187133
# 13    Sacramento, California, United States  -121.555406  38.380456
# 14 San Francisco, California, United States  -117.885359  35.187133
# 15      Stanford, California, United States    -122.1675   37.42509
# 16     Hartford, Connecticut, United States   -72.763564   41.78516
# 17    New Haven, Connecticut, United States   -72.927507  41.365709
Run Code Online (Sandbox Code Playgroud)


mvk*_*pel 5

ggmap包包括使用谷歌或数据科学工具包,后者用自己的“谷歌式的地理编码”地理编码支持。如先前的答案所述,这对于多个地址而言相当慢。

library(ggmap)
result <- geocode(as.character(dff[[1]]), source = "dsk")
print(cbind(dff, result))
#                                     address        lon      lat
# 1        Birmingham, Alabama, United States  -86.80190 33.45641
# 2            Mobile, Alabama, United States  -88.10318 30.70114
# 3           Phoenix, Arizona, United States -112.07404 33.44838
# 4            Tucson, Arizona, United States -110.97087 32.21798
# 5      Little Rock, Arkansas, United States  -91.20736 33.60892
# 6       Berkeley, California, United States -122.29673 37.86058
# 7         Duarte, California, United States -118.29866 33.78659
# 8      Encinitas, California, United States -116.84605 33.01693
# 9       La Jolla, California, United States -117.87645 33.85751
# 10   Los Angeles, California, United States -117.88536 35.18713
# 11        Orange, California, United States -117.85311 33.78780
# 12  Redwood City, California, United States -117.88536 35.18713
# 13    Sacramento, California, United States -121.55541 38.38046
# 14 San Francisco, California, United States -117.88536 35.18713
# 15      Stanford, California, United States -122.16750 37.42509
# 16     Hartford, Connecticut, United States  -72.76356 41.78516
# 17    New Haven, Connecticut, United States  -72.92751 41.36571
Run Code Online (Sandbox Code Playgroud)