使用 R 对随机生成的横断面进行有效排序

Question

使用 R 对随机生成的横断面进行有效排序

Jak*_*arm 7 automation for-loop r traveling-salesman dplyr

问题

我正在寻找一种方法来有效地对固定对象周围发生的随机选择的采样横断面进行排序。这些横断面一旦生成，就需要以一种在空间上有意义的方式进行排序，以使行进的距离最小化。这将通过确保当前横断面的终点尽可能靠近下一个横断面的起点来实现。此外，没有一个横断面可以重复。

因为有数千条断面需要订购，这是手动完成的一项非常繁琐的任务，我正在尝试使用 R 来自动化此过程。我已经生成了横断面，每个横断面都有一个起点和终点，其位置使用 360 度系统表示（例如，0 是北，90 是东，180 是南，270 是西）。我还生成了一些似乎指示下一个横断面的起点和 ID 的代码，但此代码存在一些问题：(1) 根据所考虑的起点和终点，它可能会产生错误，(2 ) 它没有实现我最终需要它实现的目标，并且 (3) 照原样，代码本身似乎过于复杂，我不禁想知道是否有更直接的方法来做到这一点。

理想情况下，代码将导致横断面被重新排序，以便它们匹配它们应该飞行的顺序，而不是它们最初输入的顺序。

数据

为简单起见，我们假设只有 10 个横断面需要排序。

# Transect ID for the start point
StID <- c(seq(1, 10, 1))

# Location of transect start point, based on a 360-degree circle
StPt <- c(342.1, 189.3, 116.5, 67.9, 72, 208.4, 173.2, 97.8, 168.7, 138.2)

# Transect ID for the end point
EndID <- c(seq(1, 10, 1))

# Location of transect start point, based on a 360-degree circle
EndPt <- c(122.3, 313.9, 198.7, 160.4, 166, 26.7, 312.7, 273.7, 288.8, 287.5)

# Dataframe
df <- cbind.data.frame(StPt, StID, EndPt, EndID)

Run Code Online (Sandbox Code Playgroud)

我试过的

请随意忽略此代码，必须有更好的方法，但它并没有真正达到预期的结果。现在我正在使用嵌套的 for 循环，它很难直观地遵循，但代表了我迄今为止最好的尝试。

# Create two new columns that will be populated using a loop
df$StPt_Next <- NA
df$ID_Next <- NA

# Also create a list to be populated as end and start points are matched
used <- c(df$StPt[1]) #puts the start point of transect #1 into the used vector since we will start with 1 and do not want to have it used again

# Then, for every row in the dataframe...
for (i in seq(1,length(df$EndPt)-1, 1)){ # Selects all rows except the last one as the last transect should have no "next" transect
  # generate some print statements to indicate that the script is indeed running while you wait....
  print(paste("######## ENDPOINT", i, ":", df$EndPt[i], " ########"))
  print(paste("searching for a start point that fits criteria to follow this endpoint",sep=""))
  # sequentially select each end point
  valueEndPt <- df[i,1]
  # and order the index by taking the absolute difference of end and start points and, if this value is greater than 180, also subtract from 360 so all differences are less than 180, then order differences from smallest to largest  
  orderx <- order(ifelse(360-abs(df$StPt-valueEndPt) > 180, 
                         abs(df$StPt-valueEndPt),
                         360-abs(df$StPt-valueEndPt)))
  tmp <- as.data.frame(orderx)
  # specify index value
  index=1
  # for as long as there is an "NA" present in the StPt_Next created before for loop...  
  while (is.na(df$StPt_Next[i])) {
    #select the value of the ordered index in sequential order     
    j=orderx[index]
    # if the start point associated with a given index is present in the list of used values...
    if (df$StPt[j] %in% used){
      # then have R print a statement indicate this is the case...
      print(paste("passing ",df$StPt[j], " as it has already been used",sep=""))
      # and move onto the next index
      index=index+1
      # break statement intended to skip the remainder of the code for values that have already been used      
      next
      # if the start point associated with a given index is not present in the list of used values...      
    } else {
      # then identify the start point value associated with that index ID... 
      valueStPt <- df$StPt[j]
      # and have R print a statement indicating an attempt is being made to use the next value      
      print(paste("trying ",df$StPt[j],sep=""))
      # if the end transect number is different from the start end transect number...
      if (df$EndID[i] != df$StID[j]) { 
        # then put the start point in the new column...
        df$StPt_Next[i] <- df$StPt[j]
        # note which record this start point came from for ease of reference/troubleshooting...
        df$ID_Next[i] <- j
        # have R print a statement that indicates a value for the new column has beed selected...        
        print(paste("using ",df$StPt[j],sep=""))
        # and add that start point to the list of used ones
        used <- c(used,df$StPt[j])
        # otherwise, if the end transect number matches the start end transect number...
      } else {
        # keep NA in this column and try again
        df$StPt_Next[i] <- NA
        # and indicate that this particular matched pair can not be used
        print(paste("cant use ",valueStPt," as the column EndID (related to index in EndPt) and StID (related to index in StPt) values are matching",sep=""))
      }# end if else statement to ensure that start and end points come from different transects
      # and move onto the next index
      index=index+1
    }# end if else statement to determine if a given start point still needs to be used
  }# end while loop to identify if there are still NA's in the new column
}# end for loop

Run Code Online (Sandbox Code Playgroud)

输出

当代码没有产生显式错误时，例如对于提供的示例数据，输出如下：

    StPt StID EndPt EndID StPt_Next ID_Next
1  342.1    1 122.3     1      67.9       4
2  189.3    2 313.9     2     173.2       7
3  116.5    3 198.7     3      97.8       8
4   67.9    4 160.4     4      72.0       5
5   72.0    5 166.0     5     116.5       3
6  208.4    6  26.7     6     189.3       2
7  173.2    7 312.7     7     168.7       9
8   97.8    8 273.7     8     138.2      10
9  168.7    9 288.8     9     208.4       6
10 138.2   10 287.5    10        NA      NA

Run Code Online (Sandbox Code Playgroud)

最后两列由代码生成并添加到原始数据帧中。StPt_Next 具有下一个最近起点的位置，而 ID_Next 指示与该下一个起点位置相关联的断面 ID。ID_Next 列指示应该飞行的顺序横断面如下 1,4,5,3,8,10,NA（又名结束），2,7,9,6 形成自己的循环，回到2.

有两个具体问题我无法解决：

(1)存在形成一个连续的序列链的问题。我认为这与 1 是起始横断线和 10 是最后一个横断线无论如何，但不知道如何在代码中指示倒数第二个横线必须与 10 匹配，以便该序列包括所有 10 条横线在终止于代表最终终点的“NA”之前。

(2) 为了真正自动化这个过程，在修复由于过早引入“NA”作为 ID_next 导致的序列提前终止之后，将创建一个新列，允许基于最有效的横断面重新排序而不是 EndID/StartID 的原始顺序。

预期结果

如果我们假设我们只有 6 个横断面要排序，而忽略由于过早引入“NA”而无法排序的 4 个横断面，这将是预期的结果：

    StPt StID EndPt EndID StPt_Next ID_Next TransNum
1  342.1    1 122.3     1      67.9       4        1
4   67.9    4 160.4     4      72.0       5        2
5   72.0    5 166.0     5     116.5       3        3
3  116.5    3 198.7     3      97.8       8        4
8   97.8    8 273.7     8     138.2      10        5
10 138.2   10 287.5    10        NA      NA        6

Run Code Online (Sandbox Code Playgroud)

编辑：关于代码显式产生的错误消息的说明

如前所述，该代码有一些缺陷。另一个缺陷是，在尝试订购大量横断面时，通常会产生错误。我不完全确定错误是在过程中的哪个步骤产生的，但我猜测它与无法匹配最后几个断面有关，可能是由于不符合“orderx”规定的标准。打印语句说“尝试 NA”而不是数据库中的起点，这会导致此错误：“if (df$EndID[i] != df$StID[j]) { : missing value where TRUE/需要假”。我猜测需要另一个 if-else 语句以某种方式指示“如果剩余的点不符合 orderx 标准，

这是一个会产生错误的更大的数据集：

EndPt <- c(158.7,245.1,187.1,298.2,346.8,317.2,74.5,274.2,153.4,246.7,193.6,302.3,6.8,359.1,235.4,134.5,111.2,240.5,359.2,121.3,224.5,212.6,155.1,353.1,181.7,334,249.3,43.9,38.5,75.7,344.3,45.1,285.7,155.5,183.8,60.6,301,132.1,75.9,112,342.1,302.1,288.1,47.4,331.3,3.4,185.3,62,323.7,188,313.1,171.6,187.6,291.4,19.2,210.3,93.3,24.8,83.1,193.8,112.7,204.3,223.3,210.7,201.2,41.3,79.7,175.4,260.7,279.5,82.4,200.2,254.2,228.9,1.4,299.9,102.7,123.7,172.9,23.2,207.3,320.1,344.6,39.9,223.8,106.6,156.6,45.7,236.3,98.1,337.2,296.1,194,307.1,86.6,65.5,86.6,296.4,94.7,279.9)

StPt <- c(56.3,158.1,82.4,185.5,243.9,195.6,335,167,39.4,151.7,99.8,177.2,246.8,266.1,118.2,358.6,357.9,99.6,209.9,342.8,106.5,86.4,35.7,200.6,65.6,212.5,159.1,297,285.9,300.9,177,245.2,153.1,8.1,76.5,322.4,190.8,35.2,342.6,8.8,244.6,202,176.2,308.3,184.2,267.2,26.6,293.8,167.3,30.5,176,74.3,96.9,186.7,288.2,62.6,331.4,254.7,324.1,73.4,16.4,64,110.9,74.4,69.8,298.8,336.6,58.8,170.1,173.2,330.8,92.6,129.2,124.7,262.3,140.4,321.2,34,79.5,263,66.4,172.8,205.5,288,98.5,335.2,38.7,289.7,112.7,350.7,243.2,185.4,63.9,170.3,326.3,322.9,320.6,199.2,287.1,158.1)

EndID <- c(seq(1, 100, 1))

StID <- c(seq(1, 100, 1))

df <- cbind.data.frame(StPt, StID, EndPt, EndID)

Run Code Online (Sandbox Code Playgroud)

任何建议将不胜感激！

Answer 1

Jak*_*arm 1

感谢大家的建议，@Simon 的解决方案最适合我的操作。@Geoffrey 使用 x,y 坐标的实际方法非常棒，因为它允许绘制横断面和行进顺序。因此，我发布了一个使用他们两人的代码生成的混合解决方案，以及附加的注释和代码来详细说明该过程并获得我想要的实际最终结果。我不确定这是否会对将来的任何人有帮助，但是，由于没有答案提供可以 100% 解决我的问题的解决方案，所以我想我会分享我的想法。

正如其他人所指出的，这是一种旅行推销员问题。它是不对称的，因为从样线“t”的末端到样线“t+1”的起点的距离与从样线“t+1”的末端到样线“t”的起点的距离不同。而且，它是一个“路径”解决方案而不是“循环”解决方案。

#=========================================
# Packages
#=========================================
library(TSP)
library(useful)
library(dplyr)

#=========================================
# Full dataset for testing
#=========================================
EndPt <- c(158.7,245.1,187.1,298.2,346.8,317.2,74.5,274.2,153.4,246.7,193.6,302.3,6.8,359.1,235.4,134.5,111.2,240.5,359.2,121.3,224.5,212.6,155.1,353.1,181.7,334,249.3,43.9,38.5,75.7,344.3,45.1,285.7,155.5,183.8,60.6,301,132.1,75.9,112,342.1,302.1,288.1,47.4,331.3,3.4,185.3,62,323.7,188,313.1,171.6,187.6,291.4,19.2,210.3,93.3,24.8,83.1,193.8,112.7,204.3,223.3,210.7,201.2,41.3,79.7,175.4,260.7,279.5,82.4,200.2,254.2,228.9,1.4,299.9,102.7,123.7,172.9,23.2,207.3,320.1,344.6,39.9,223.8,106.6,156.6,45.7,236.3,98.1,337.2,296.1,194,307.1,86.6,65.5,86.6,296.4,94.7,279.9)

StPt <- c(56.3,158.1,82.4,185.5,243.9,195.6,335,167,39.4,151.7,99.8,177.2,246.8,266.1,118.2,358.6,357.9,99.6,209.9,342.8,106.5,86.4,35.7,200.6,65.6,212.5,159.1,297,285.9,300.9,177,245.2,153.1,8.1,76.5,322.4,190.8,35.2,342.6,8.8,244.6,202,176.2,308.3,184.2,267.2,26.6,293.8,167.3,30.5,176,74.3,96.9,186.7,288.2,62.6,331.4,254.7,324.1,73.4,16.4,64,110.9,74.4,69.8,298.8,336.6,58.8,170.1,173.2,330.8,92.6,129.2,124.7,262.3,140.4,321.2,34,79.5,263,66.4,172.8,205.5,288,98.5,335.2,38.7,289.7,112.7,350.7,243.2,185.4,63.9,170.3,326.3,322.9,320.6,199.2,287.1,158.1)

EndID <- c(seq(1, 100, 1))

StID <- c(seq(1, 100, 1))

df <- cbind.data.frame(StPt, StID, EndPt, EndID)

#=========================================
# Convert polar coordinates to cartesian x,y data 
#=========================================
# Area that the transect occupy in space only used for graphing
planeDim <- 1
# Number of transects
nTransects <- 100

# Convert 360-degree polar coordinates to x,y cartesian coordinates to facilitate calculating a distance matrix based on the Pythagorean theorem
EndX <- as.matrix(pol2cart(planeDim, EndPt, degrees = TRUE)["x"])
EndY <- as.matrix(pol2cart(planeDim, EndPt, degrees = TRUE)["y"])
StX <- as.matrix(pol2cart(planeDim, StPt, degrees = TRUE)["x"])
StY <- as.matrix(pol2cart(planeDim, StPt, degrees = TRUE)["y"])

# Matrix of x,y pairs for the beginning ("b") and end ("e") points of each transect
b <- cbind(c(StX), c(StY))
e <- cbind(c(EndX), c(EndY))

#=========================================
# Function to calculate the distance from all endpoints in the ePts matrix to a single beginning point in bPt
#=========================================
dist <- function(ePts, bPt) {
  # Use the Pythagorean theorem to calculate the hypotenuse (i.e., distance) between every end point in the matrix ePts to the point bPt
  apply(ePts, 1, function(p) sum((p - bPt)^2)^0.5)
}

#=========================================
# Distance matrix
#=========================================
# Apply the "dist" function to all beginning points to create a matrix that has the distance between every start and endpoint
## Note: because this is an asymmetric traveling salesperson problem, the distance matrix is directional, thus, the distances at any position in the matrix must be the distance from the transect shown in the row label and to the transect shown in the column label
distMatrix <- apply(b, 1, FUN = dist, ePts = e)

## Set the distance between the beginning and end of each transect to zero so that there is no "cost" to walking the transect
diag(distMatrix) <- 0

#=========================================
# Solve asymmetric TSP
#=========================================
# This creates an instance of the asymmetric traveling salesperson (ASTP) 
atsp <- as.ATSP(distMatrix)
# This creates an object of Class Tour that travels to all of the points 
## In this case, the repetitive_nn produces the smallest overall and transect-to-transect
tour <- solve_TSP(atsp, method = "repetitive_nn")


#=========================================
# Create a path by cutting the tour at the most "expensive" transition 
#=========================================
# Sort the original data frame to match the order of the solution
dfTour = df[as.numeric(tour),]

# Add the following columns to the original dataframe: 
dfTour = dfTour %>%
  # Assign visit order (1 to 100, ascending) 
  mutate(visit_order = 1:nrow(dfTour)) %>%
  # The ID of the next transect to move to
  mutate(next_StID = lead(StID, order_by = visit_order),
  # The angle of the start point for the next transect
         next_StPt = lead(StPt, order_by = visit_order))

# lead() generates the NA's in the last record for next_StID, next_StPt, replace these by adding that information
dfTour[dfTour$visit_order == nrow(dfTour),'next_StID'] <-
  dfTour[dfTour$visit_order == 1,'StID']

dfTour[dfTour$visit_order == nrow(dfTour),'next_StPt'] <-
  dfTour[dfTour$visit_order == 1,'StPt']

# Function to calculate distance for 360 degrees rather than x,y coordinates
transect_distance <- function(end,start){
  abs_dist = abs(start-end)
  ifelse(360-abs_dist > 180, abs_dist, 360-abs_dist)
}

# Compute distance between end of each transect and start of next using polar coordinates
dfTour = dfTour %>% mutate(dist_between = transect_distance(EndPt, next_StPt))

# Identify the longest transition point for breaking the cycle
min_distance <- sum(dfTour$dist_between) - max(dfTour$dist_between)
path_start <- dfTour[dfTour$dist_between == max(dfTour$dist_between), 'next_StID']
path_end <- dfTour[dfTour$dist_between == max(dfTour$dist_between), 'EndID']

# Make a statement about the least cost path
print(sprintf("minimum cost path = %.2f, starting at node %d, ending at node %d",
              min_distance, path_start, path_end))

# The variable path shows the order in which you should visit the transects
path <- cut_tour(tour, path_start, exclude_cut = F) 

# Arrange df from smallest to largest travel distance
tmp1 <- dfTour %>%
  arrange(dist_between)

# Change dist_between and visit_order to NA for transect with the largest distance to break cycle 
# (e.g., we will not travel this distance, this represents the path endpoint) 
tmp1[length(dfTour$dist_between):length(dfTour$dist_between),8] <- NA
tmp1[length(dfTour$dist_between):length(dfTour$dist_between),5] <- NA

# Set df order back to ascending by visit order
tmp2 <- tmp1 %>%
  arrange(visit_order)

# Detect the break in a sequence of visit_order introduced by the NA (e.g., 1,2,3....5,6) and mark groups before the break with 0 and after the break with 1 in the "cont_per" column 
tmp2$cont_per <- cumsum(!c(TRUE, diff(tmp2$visit_order)==1))
# Sort "cont_per" such that the records following the break become the beginning of the path and the ones following the break represent the middle orders and the point with the NA being assigned the last visit order, and assign a new visit order
tmp3 <- tmp2%>%
  arrange(desc(cont_per))%>%
  mutate(visit_order_FINAL=seq(1, length(tmp2$visit_order), 1))

# Datframe ordered by progression of transects
trans_order <- cbind.data.frame(tmp3[2], tmp3[1], tmp3[4], tmp3[3], tmp3[6], tmp3[7], tmp3[8], tmp3[10])
# Insert NAs for "next" info for final transect
trans_order[nrow(trans_order),'next_StPt'] <- NA 
trans_order[nrow(trans_order), 'next_StID'] <- NA

#=========================================
# View data
#=========================================
head(trans_order)

#=========================================
# Plot
#=========================================
#For fun, we can visualize the transects:
# make an empty graph space
plot(1,1, xlim = c(-planeDim-0.1, planeDim+0.1), ylim = c(-planeDim-0.1, planeDim+0.1), ty = "n")

# plot the beginning of each transect as a green point, the end as a red point,
and a grey line representing the transect
for(i in 1:nrow(e)) {
  xs = c(b[i,1], e[i,1])
  ys = c(b[i,2], e[i,2])
  lines(xs, ys, col = "light grey", lwd = 1, lty = 1)
  points(xs, ys, col = c("green", "red"), pch = 1, cex = 1)
  #text((xs), (ys), i)
}

# Add the path to the visualization
for(i in 1:(length(path)-1)) {
    # This makes a line between the x coordinates for the end point of path i and beginning point of path i+1 
    lines(c(e[path[i],1], b[path[i+1],1]), c(e[path[i],2], b[path[i+1], 2]), lty = 1, lwd=1)
}

Run Code Online (Sandbox Code Playgroud)

这就是最终结果的样子

归档时间：	5 年，9 月前
查看次数：	207 次
最近记录：	5 年，7 月前