使用 st_distance 计算两组点之间的所有距离

Question

使用 st_distance 计算两组点之间的所有距离

我有两组点作为sf对象存储在 R 中。点对象 x 包含 204,467 个点，点 y 包含 5,297 个点。

理论上，我想计算从 x 中的所有点到 y 中的所有点的距离。我知道这会创建一个庞大的矩阵，但在我的 i7 桌面上使用该st_distance(x, y, by_element=FALSE)包sf大约需要 40 分钟。

我想要做的是计算从 x 中的所有点到 y 中的所有点的距离，然后我想将其转换为 a data.frame，其中包含相应 x 和 y 对点的所有变量。这是因为我希望使用进行聚合方面的灵活性dplyr，例如，我想找到 y 中距离 x 10、50、100 公里以内的点的数量，以及其中x$year < y$year。

我成功创建了距离矩阵，其中包含大约 1,083,061,699 个单元格。我知道这是一种非常低效的方法，但它在聚合方面提供了灵活性。欢迎其他建议。

下面是创建两个 sf 点对象并测量它们之间距离的代码。接下来，我想将其转换为包含 x 和 y 中所有变量的 data.frame，但这是我无法继续的地方。

如果我建议的工作流程不可行，有人可以提供替代解决方案来测量到预定义半径内所有点的距离，并使用 x 和 y 中的所有变量创建结果的 data.frame 吗？

# Create two sf point objects 
set.seed(123)
library(sf)


pts1 <- st_as_sf(x = data.frame(id=seq(1,204467,1),
                                year=sample(seq(from = 1990, to = 2018, by = 1), size = 204467, replace = TRUE),
                                xcoord=sample(seq(from = -180, to = 180, by = 1), size = 204467, replace = TRUE),
                                ycoord=sample(seq(from = -90, to = 90, by = 1), size = 204467, replace = TRUE)),
                 coords=c("xcoord","ycoord"),crs=4326)

pts2 <- st_as_sf(x = data.frame(id=seq(1,5297,1),
                                year=sample(seq(from = 1990, to = 2018, by = 1), size = 5297, replace = TRUE),
                                xcoord=sample(seq(from = -180, to = 180, by = 1), size = 5297, replace = TRUE),
                                ycoord=sample(seq(from = -90, to = 90, by = 1), size = 5297, replace = TRUE)),
                 coords=c("xcoord","ycoord"),crs=4326)

distmat <- st_distance(pts1,pts2,by_element = FALSE)

Run Code Online (Sandbox Code Playgroud)

Answer 1

dww*_*dww 4

我会考虑以不同的方式处理这个问题。一旦有了distmat矩阵，您就可以进行您描述的计算类型，而无需数据框。您可以使用标准子集来查找哪些点满足您指定的条件。

pts1$year例如，要找到大于pts2$year我们能做的点的组合：

subset_points = outer(pts1$year, pts2$year, `>`)

Run Code Online (Sandbox Code Playgroud)

然后，要找出其中有多少个相距超过 100 公里，我们可以这样做

library(units)
sum(distmat[subset_points] > (100 * as_units('km', 1)))

Run Code Online (Sandbox Code Playgroud)

关于内存使用的注意事项

无论您使用 sf 或 data.frame 对象来解决此问题，您都可能会开始遇到 RAM 限制，因为 data.table 的每个矩阵或列中都有 1e9 个浮点。您可能会考虑将距离矩阵转换为raster. 然后，栅格可以存储在磁盘上而不是内存中，并且您可以利用包中的内存安全功能raster来处理您的问题。

我们如何使用栅格从磁盘工作并节省 RAM

我们可以对非常大的矩阵使用内存安全的光栅操作，例如：

library(raster)

# convert our matrices to rasters, so we can work on them from disk
r = raster(matrix(as.numeric(distmat), length(pts1$id), length(pts2$id)))
s = raster(subset_points)
remove('distmat', 'subset_points')

# now create a raster equal to r, but with zeroes in the cells we wish to exclude from calculation
rs = overlay(r,s,fun=function(x,y){x*y}, filename='out1.tif')     

# find which cells have value greater than x (1e6 in the example)
Big_cells = reclassify(rs, matrix(c(-Inf, 1e6, 0, 1e6, Inf, 1), ncol=3, byrow=TRUE), 'out.tiff', overwrite=T)

# and finally count the cells
N = cellStats(Big_cells, sum)

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，12 月前
查看次数：	3223 次
最近记录：	2 年，2 月前