如何从点数据近似多边形(邮政编码区域)

sta*_*oob 0 r

我在 R 中有以下数据集 - 该数据集包含地理编码地址(加拿大)及其邮政编码、经度和纬度的列表:https ://www.dropbox.com/scl/fi/9kjoqsppb85ip0tdc5wmr/stackoverflow_example.csv?rlkey= cwjk222jnoz8c9cgbt01p6bep&dl=0

我将这些数据加载到 R 中:

library(httr)
library(data.table)

url <- "https://www.dropbox.com/scl/fi/9kjoqsppb85ip0tdc5wmr/stackoverflow_example.csv?rlkey=cwjk222jnoz8c9cgbt01p6bep&dl=1"

response <- GET(url)

df <- fread(content(response, "text"), sep = ",", quote = "", fill = TRUE)


> head(df)
         "" "latitude" "longitude" "source_id"                   "id" "group_id" "street_no"            "street"       "str_name" "str_type" "str_dir" "unit"     "city" "postal_code"
1: "999976"   44.25845   -76.46308          "" "6baaa1692aaaa7b496aa"    2494328       "104" "POINT ST. MARK DR" "POINT ST. MARK"       "DR"        ""     "" "KINGSTON"     "K7K 6X8"
2: "999977"   44.26391   -76.45090          "" "1f01e7839e59727d95a3"    2508891       "229"      "GREENLEES DR"      "GREENLEES"       "DR"        ""     "" "KINGSTON"     "K7K 6R1"
3: "999978"   44.26262   -76.45164          "" "289d509f161955a7b5ad"    2519600       "443"        "MAUREEN ST"        "MAUREEN"       "ST"        ""     "" "KINGSTON"     "K7K 7M2"
4: "999979"   44.26545   -76.45230          "" "e6516f9c0e173b684627"    2506548       "198"      "GREENLEES DR"      "GREENLEES"       "DR"        ""     "" "KINGSTON"     "K7K 6P7"
5: "999983"   44.23525   -76.49264          "" "2576c9ff82b86638737c"    2508273        "22"         "ELLICE ST"         "ELLICE"       "ST"        ""     "" "KINGSTON"     "K7K 1M5"
6: "999993"   44.26520   -76.45182          "" "03aaecc65f29e1e04fe1"    2507798       "212"      "GREENLEES DR"      "GREENLEES"       "DR"        ""     "" "KINGSTON"     "K7K 6P7"
               "full_addr" "city_pcs" "str_name_pcs" "str_type_pcs" "str_dir_pcs" "csduid"  "csdname" "pruid"         "provider"
1: "104 POINT ST. MARK DR" "KINGSTON"   "PT ST MARK"           "DR"            ""  3510010 "Kingston"      35 "City of Kingston"
2:      "229 GREENLEES DR" "KINGSTON"    "GREENLEES"           "DR"            ""  3510010 "Kingston"      35 "City of Kingston"
3:        "443 MAUREEN ST" "KINGSTON"      "MAUREEN"           "ST"            ""  3510010 "Kingston"      35 "City of Kingston"
4:      "198 GREENLEES DR" "KINGSTON"    "GREENLEES"           "DR"            ""  3510010 "Kingston"      35 "City of Kingston"
5:          "22 ELLICE ST" "KINGSTON"       "ELLICE"           "ST"            ""  3510010 "Kingston"      35 "City of Kingston"
6:      "212 GREENLEES DR" "KINGSTON"    "GREENLEES"           "DR"            ""  3510010 "Kingston"      35 "City of Kingston"
Run Code Online (Sandbox Code Playgroud)

我的问题:使用这些数据,我试图“近似”每个邮政编码的地理边界并将结果可视化。

基于我在这里发布的类似问题(R:制作加拿大邮政编码地图),有人向我建议一种称为“多边形裁剪”的技术可能有助于完成此任务(即参见@User提供的答案:Michael)杜瓦瓶)。

我正在尝试了解“多边形裁剪”如何有助于完成此任务 - 但我不确定如何开始使用此任务。最后,我想要一个类似于 shapefile 的东西,其中包含每个邮政编码的“周长”(以经度和纬度坐标表示)。

有人可以告诉我该怎么做吗?

谢谢!

笔记:

I_O*_*_O 5

沃罗诺伊近似:

通过 Voronoi 分区近似邮政编码区域

(地图图块由 Stamen Design 提供,CC BY 3.0 下。数据由 OpenStreetMap 提供,根据 ODbL。)


从您的 httr 响应开始,方法如下:

  • 创建点数据d(在代码中省略 quote 参数,否则你会得到有趣的列名称):
    d <- data.table::fread(content(response, "text"), sep = ",", fill = TRUE)
Run Code Online (Sandbox Code Playgroud)
  • 为该地区(加拿大东部)设置适当的预测:
    my_proj <- '+proj=aea +lat_1=50 +lat_2=70 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs'
Run Code Online (Sandbox Code Playgroud)
  • 创建一个空间数据框d
    library(sf)
    library(dplyr)

    the_points <- 
      d |>
      select(longitude, latitude, postal_code) |>
      st_as_sf(coords = c('longitude', 'latitude'), 
               crs = 4326 ## EPSG-code for lon/lat coords
               ) |>
      st_transform(my_proj) |> ## transf. to "Canadian" proj.
      head(100) ## only the first 100 items for demonstration
Run Code Online (Sandbox Code Playgroud)
  • 使用 Voronoi 划分的近似区域:
    the_zones <- 
      the_points |>
      st_union() |> ## !
      st_voronoi() |>
      st_collection_extract('POLYGON') |>
      st_as_sf() |>
      ## create an ID for later join with postal code:
      mutate(id = paste0('zone_', row_number())) |>
      rename(geometry = x)
Run Code Online (Sandbox Code Playgroud)
  • st_intersection使用和获取哪个邮政编码(点数据)属于哪个区域(多边形数据)的查找向量pull
    postal_code_lookup <- 
      st_intersection(the_zones, the_points) |> ## keep only zones around points
      distinct() |>
      as.data.frame() |>
      pull(postal_code, id) ## pulling two cols creates a named vector
    
    the_zones |>
      mutate(postal_code = postal_code_lookup[id])
Run Code Online (Sandbox Code Playgroud)

生成的空间数据帧(用于进一步处理):


    + Simple feature collection with 74 features and 2 fields
    Geometry type: POLYGON
    Dimension:     XY
    Bounding box:  xmin: 1554420 ymin: 683010.2 xmax: 1569265 ymax: 698094.6
    CRS:           +proj=aea +lat_1=50 +lat_2=70 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
    First 10 features:
            id                       geometry postal_code
    1   zone_1 POLYGON ((1554420 697558.1,...     K7K 0C3
    2   zone_2 POLYGON ((1560077 688369.5,...     K7K 1X7
    3   zone_3 POLYGON ((1560089 688314.7,...     K7K 4S4
    4   zone_4 POLYGON ((1554420 688610.3,...     K7K 4W8
    ## ...
Run Code Online (Sandbox Code Playgroud)
  • 可视化:
    library(gglot2)
    library(ggmap)
  
the_zones <- 
  the_zones |>
  mutate(postal_code = postal_code_lookup[id]) |>
  st_transform(4326) ## back to geographic coords

the_background <- get_stamenmap(c(left = -76.46, bottom = 44.25, 
                                  right = -76.42, top = 44.27),
                                zoom = 14
                                )


ggmap(the_background) +
    geom_sf(data = the_zones,  alpha = 0, inherit.aes = FALSE) +
    geom_sf_text(data = the_zones, aes(label = postal_code), inherit.aes = FALSE) +
    geom_sf(data = st_transform(the_points, 4326), inherit.aes = FALSE) +
    coord_sf(xlim = c(-76.46, -76.42), ylim = c(44.255, 44.27)) + 
    theme_minimal()

Run Code Online (Sandbox Code Playgroud)