suv*_*pta 3 geocoding r dataframe
我正在尝试获取纽约地区经纬度坐标的邮政编码
我尝试使用来自 google 的反向地理编码器 API,但它每天限制为 2500 次点击,因此可以批量处理我的数据帧。
接下来,我尝试使用带有数据集邮政编码的库(邮政编码),但无法将纬度经度与火车数据集的坐标相匹配,因为所有经纬度坐标都不在数据集中。
进一步虽然使用 KNN 预测数据集的邮政编码,但无法获得正确的结果。
zipcode_latlon = zipcode[zipcode$state=="NY",c(1,4,5)]
train_latlon = train_data[,c("latitude","longitude")]
zip1 = rep(10007, nrow(train_latlon))
zip1 = as.character(zip1)
train_latlon = cbind(zip1, train_latlon)
colnames(train_latlon) = c("zip","latitude","longitude")
knn_fit = knn(zipcode_latlon, train_latlon,zipcode_latlon$zip, k=1)
Run Code Online (Sandbox Code Playgroud)
需要知道如何从 lat long 批量获取邮政编码,任何方法在 R 中都很好。
我认为你这样做是错误的。您可以在没有地理编码器的情况下找到纬度/经度坐标的邮政编码- 您只需要在此处下载美国邮政编码 shapefile ,然后进行空间连接:
library(sp)
library(rgdal)
#import zips shapefile and transform CRS
zips <- readOGR("cb_2015_us_zcta510_500k.shp")
zips <- spTransform(zips, CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
#here is a sample with three cities in New York State and their coordinates
df <- as.data.frame(matrix(nrow = 3, ncol =3))
colnames(df) <- c("lat", "lon", "city")
df$lon <- c(43.0481, 43.1610, 42.8864)
df$lat <- c(-76.1474, -77.6109,-78.8784)
df$city <- c("Syracuse", "Rochester", "Buffalo")
df
lat lon city
1 -76.1474 43.0481 Syracuse
2 -77.6109 43.1610 Rochester
3 -78.8784 42.8864 Buffalo
#extract only the lon/lat
xy <- df[,c(1,2)]
#transform coordinates into a SpatialPointsDataFrame
spdf <- SpatialPointsDataFrame(coords = xy, data = df, proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
#subset only the zipcodes in which points are found
zips_subset <- zips[spdf, ]
#NOTE: the column in zips_subset containing zipcodes is ZCTA5CE10
#use over() to overlay points in polygons and then add that to the original dataframe
df$zip <- over(spdf, zips_subset[,"ZCTA5CE10"])
Run Code Online (Sandbox Code Playgroud)
瞧!你有每个点的邮政编码
df
lat lon city ZCTA5CE10
1 -76.1474 43.0481 Syracuse 13202
2 -77.6109 43.1610 Rochester 14604
3 -78.8784 42.8864 Buffalo 14202
Run Code Online (Sandbox Code Playgroud)