R - 查找最近的相邻点和给定半径内的邻居数,坐标为lat-long

use*_*942 25 r distance latitude-longitude

我试图弄清楚我的数据集中有多少孤立的某些点.我使用两种方法来确定隔离,最近邻居的距离和给定半径内的相邻站点的数量.我所有的坐标都是纬度和经度

这就是我的数据:

    pond            lat         long        area    canopy  avg.depth   neighbor    n.lat   n.long  n.distance  n.area  n.canopy    n.depth n.avg.depth radius1500
    A10             41.95928    -72.14605   1500    66      60.61538462                                 
    AA006           41.96431    -72.121     250     0       57.77777778                                 
    Blacksmith      41.95508    -72.123803  361     77      71.3125                                 
    Borrow.Pit.1    41.95601    -72.15419   0       0       41.44444444                                 
    Borrow.Pit.2    41.95571    -72.15413   0       0       37.7                                    
    Borrow.Pit.3    41.95546    -72.15375   0       0       29.22222222                                 
    Boulder         41.918223   -72.14978   1392    98      43.53333333                                 
Run Code Online (Sandbox Code Playgroud)

我想把最近的邻近池塘的名称放在列邻居,它的纬度和长度在n.lat和n.long,两个池塘之间的距离n.distance,以及区域,冠层和avg.depth in每个适当的列.

其次,我想把目标池塘1500米范围内的池塘数量调到半径1500.

有谁知道一个功能或包,可以帮助我计算我想要的距离/数字?如果这是一个问题,输入我需要的其他数据并不困难,但最近邻居的名字和距离加上1500米以内的池塘数量是我真正需要帮助的.

谢谢.

Zby*_*nek 36

最好的选择是使用库sprgeos,从而使您能够构建空间类和执行地理处理.

library(sp)
library(rgeos)
Run Code Online (Sandbox Code Playgroud)

读取数据并将其转换为空间对象:

mydata <- read.delim('d:/temp/testfile.txt', header=T)

sp.mydata <- mydata
coordinates(sp.mydata) <- ~long+lat

class(sp.mydata)
[1] "SpatialPointsDataFrame"
attr(,"package")
[1] "sp"
Run Code Online (Sandbox Code Playgroud)

现在计算点之间的成对距离

d <- gDistance(sp.mydata, byid=T)
Run Code Online (Sandbox Code Playgroud)

找到第二个最短距离(最近距离指向自身,因此使用第二个最短距离)

min.d <- apply(d, 1, function(x) order(x, decreasing=F)[2])
Run Code Online (Sandbox Code Playgroud)

使用所需变量构造新数据框

newdata <- cbind(mydata, mydata[min.d,], apply(d, 1, function(x) sort(x, decreasing=F)[2]))

colnames(newdata) <- c(colnames(mydata), 'neighbor', 'n.lat', 'n.long', 'n.area', 'n.canopy', 'n.avg.depth', 'distance')

newdata
            pond      lat      long area canopy avg.depth     neighbor    n.lat    n.long n.area n.canopy n.avg.depth
6            A10 41.95928 -72.14605 1500     66  60.61538 Borrow.Pit.3 41.95546 -72.15375      0        0    29.22222
3          AA006 41.96431 -72.12100  250      0  57.77778   Blacksmith 41.95508 -72.12380    361       77    71.31250
2     Blacksmith 41.95508 -72.12380  361     77  71.31250        AA006 41.96431 -72.12100    250        0    57.77778
5   Borrow.Pit.1 41.95601 -72.15419    0      0  41.44444 Borrow.Pit.2 41.95571 -72.15413      0        0    37.70000
4   Borrow.Pit.2 41.95571 -72.15413    0      0  37.70000 Borrow.Pit.1 41.95601 -72.15419      0        0    41.44444
5.1 Borrow.Pit.3 41.95546 -72.15375    0      0  29.22222 Borrow.Pit.2 41.95571 -72.15413      0        0    37.70000
6.1      Boulder 41.91822 -72.14978 1392     98  43.53333 Borrow.Pit.3 41.95546 -72.15375      0        0    29.22222
        distance
6   0.0085954872
3   0.0096462277
2   0.0096462277
5   0.0003059412
4   0.0003059412
5.1 0.0004548626
6.1 0.0374480316
Run Code Online (Sandbox Code Playgroud)

编辑:如果坐标以度为单位并且您想要以公里为单位计算距离,请使用包geosphere

library(geosphere)

d <- distm(sp.mydata)

# rest is the same
Run Code Online (Sandbox Code Playgroud)

如果点分散在地球上并且坐标以度为单位,则应该提供更好的结果


bzk*_*zki 9

我在下面添加了一个使用较新sf软件包的替代解决方案,供那些感兴趣并现在访问此页面的人使用(就像我所做的那样)。

首先,加载数据并创建sf对象。

# Using sf
mydata <- structure(
  list(pond = c("A10", "AA006", "Blacksmith", "Borrow.Pit.1", 
                "Borrow.Pit.2", "Borrow.Pit.3", "Boulder"), 
       lat = c(41.95928, 41.96431, 41.95508, 41.95601, 41.95571, 41.95546, 
               41.918223), 
       long = c(-72.14605, -72.121, -72.123803, -72.15419, -72.15413, 
                -72.15375, -72.14978), 
       area = c(1500L, 250L, 361L, 0L, 0L, 0L, 1392L), 
       canopy = c(66L, 0L, 77L, 0L, 0L, 0L, 98L), 
       avg.depth = c(60.61538462, 57.77777778, 71.3125, 41.44444444, 
                     37.7, 29.22222222, 43.53333333)), 
  class = "data.frame", row.names = c(NA, -7L))


library(sf)
data_sf <- st_as_sf(mydata, coords = c("long", "lat"),
                    # Change to your CRS
                    crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
st_is_longlat(data_sf)
Run Code Online (Sandbox Code Playgroud)

sf::st_distance 使用纬度/经度数据时,使用大圆距离计算以米为单位的距离矩阵。

dist.mat <- st_distance(data_sf) # Great Circle distance since in lat/lon
# Number within 1.5km: Subtract 1 to exclude the point itself
num.1500 <- apply(dist.mat, 1, function(x) {
  sum(x < 1500) - 1
})

# Calculate nearest distance
nn.dist <- apply(dist.mat, 1, function(x) {
  return(sort(x, partial = 2)[2])
})
# Get index for nearest distance
nn.index <- apply(dist.mat, 1, function(x) { order(x, decreasing=F)[2] })

n.data <- mydata
colnames(n.data)[1] <- "neighbor"
colnames(n.data)[2:ncol(n.data)] <- 
  paste0("n.", colnames(n.data)[2:ncol(n.data)])
mydata2 <- data.frame(mydata,
                      n.data[nn.index, ],
                      n.distance = nn.dist,
                      radius1500 = num.1500)
rownames(mydata2) <- seq(nrow(mydata2))
Run Code Online (Sandbox Code Playgroud)
mydata2
          pond      lat      long area canopy avg.depth     neighbor    n.lat    n.long n.area n.canopy
1          A10 41.95928 -72.14605 1500     66  60.61538 Borrow.Pit.1 41.95601 -72.15419      0        0
2        AA006 41.96431 -72.12100  250      0  57.77778   Blacksmith 41.95508 -72.12380    361       77
3   Blacksmith 41.95508 -72.12380  361     77  71.31250        AA006 41.96431 -72.12100    250        0
4 Borrow.Pit.1 41.95601 -72.15419    0      0  41.44444 Borrow.Pit.2 41.95571 -72.15413      0        0
5 Borrow.Pit.2 41.95571 -72.15413    0      0  37.70000 Borrow.Pit.1 41.95601 -72.15419      0        0
6 Borrow.Pit.3 41.95546 -72.15375    0      0  29.22222 Borrow.Pit.2 41.95571 -72.15413      0        0
7      Boulder 41.91822 -72.14978 1392     98  43.53333 Borrow.Pit.3 41.95546 -72.15375      0        0
  n.avg.depth n.distance radius1500
1    41.44444  766.38426          3
2    71.31250 1051.20527          1
3    57.77778 1051.20527          1
4    37.70000   33.69099          3
5    41.44444   33.69099          3
6    37.70000   41.99576          3
7    29.22222 4149.07406          0
Run Code Online (Sandbox Code Playgroud)

为了计算距离之后获得的最近的邻居,你可以使用sort()partial = 2论证。根据数据量,这可能比order在之前的解决方案中使用要快得多。该软件包Rfast可能会更快,但我避免在此处包含其他软件包。有关各种解决方案的讨论和基准测试,请参阅此相关帖子:https : //stackoverflow.com/a/53144760/12265198