R中的双循环操作(以示例为例)

Bil*_* TP 5 performance for-loop r

请看下面的小工作示例:

#### Pseudo data
nobs1 <- 4000
nobs2 <- 5000
mylon1 <- runif(nobs1, min=0, max=1)-76
mylat1 <- runif(nobs1, min=0, max=1)+37
mylon2 <- runif(nobs2, min=0, max=1)-76
mylat2 <- runif(nobs2, min=0, max=1)+37

#### define a distance function
thedistance <- function(lon1, lat1, lon2, lat2) {
 R <- 6371 # Earth mean radius [km]
 delta.lon <- (lon2 - lon1)
 delta.lat <- (lat2 - lat1)
 a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2
 c <- 2 * asin(min(1,sqrt(a)))
 d = R * c
 return(d)
}

ptm <- proc.time()

#### Calculate distances between locations
# Initiate the resulting distance vector
ndistance <- nobs1*nobs2 # The number of distances
mydistance <- vector(mode = "numeric", length = ndistance)

k=1
for (i in 1:nobs1) {
 for (j in 1:nobs2) {
  mydistance[k] = thedistance(mylon1[i],mylat1[i],mylon2[j],mylat2[j])
  k=k+1
 }
}

proc.time() - ptm
Run Code Online (Sandbox Code Playgroud)

计算时间:

  user  system elapsed 
249.85    0.16  251.18
Run Code Online (Sandbox Code Playgroud)

在这里,我的问题是是否仍有加速双循环计算的空间.非常感谢你.

tal*_*lat 5

这是一个选项,可以在我的机器上将运行时间减少到约2秒,因为它的一部分是矢量化的.

下面是与原始解决方案的直接比较.

测试数据:

nobs1 <- 4000
nobs2 <- 5000
mylon1 <- runif(nobs1, min=0, max=1)-76
mylat1 <- runif(nobs1, min=0, max=1)+37
mylon2 <- runif(nobs2, min=0, max=1)-76
mylat2 <- runif(nobs2, min=0, max=1)+37
Run Code Online (Sandbox Code Playgroud)

原始解决方案

#### define a distance function
thedistance <- function(lon1, lat1, lon2, lat2) {
  R <- 6371 # Earth mean radius [km]
  delta.lon <- (lon2 - lon1)
  delta.lat <- (lat2 - lat1)
  a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2
  c <- 2 * asin(min(1,sqrt(a)))
  d = R * c
  return(d)
}

ptm <- proc.time()

#### Calculate distances between locations
# Initiate the resulting distance vector
ndistance <- nobs1*nobs2 # The number of distances
mydistance <- vector(mode = "numeric", length = ndistance)

k=1
for (i in 1:nobs1) {
  for (j in 1:nobs2) {
    mydistance[k] = thedistance(mylon1[i],mylat1[i],mylon2[j],mylat2[j])
    k=k+1
  }
}

proc.time() - ptm
   User      System     elapsed 
148.243       0.681     148.901 
Run Code Online (Sandbox Code Playgroud)

我的方法:

# modified (vectorized) distance function:
thedistance2 <- function(lon1, lat1, lon2, lat2) {
  R <- 6371 # Earth mean radius [km]
  delta.lon <- (lon2 - lon1)
  delta.lat <- (lat2 - lat1)
  a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2
  c <- 2 * asin(pmin(1,sqrt(a)))   # pmin instead of min
  d = R * c
  return(d)
}

ptm2 <- proc.time()

lst <- vector("list", length = nobs1)

for (i in seq_len(nobs1)) {
    lst[[i]] = thedistance2(mylon1[i],mylat1[i],mylon2,mylat2)
}

res <- unlist(lst)

proc.time() - ptm2
   User      System     elapsed
  1.988       0.331       2.319 
Run Code Online (Sandbox Code Playgroud)

结果是否相等?

all.equal(mydistance, res)
#[1] TRUE
Run Code Online (Sandbox Code Playgroud)