具有距离条件的最近邻连接

cin*_*n21 5 python pandas scikit-learn geopandas

在这个问题中,我提到了这个项目:

\n\n
\n

https://automating-gis-processes.github.io/site/master/notebooks/L3/nearest-neighbor-faster.html

\n
\n\n

我们有两个 GeoDataFrame:

\n\n

建筑物:

\n\n
             name                   geometry\n0            None  POINT (24.85584 60.20727)\n1     Uimastadion  POINT (24.93045 60.18882)\n2            None  POINT (24.95113 60.16994)\n3  Hartwall Arena  POINT (24.92918 60.20570)\n
Run Code Online (Sandbox Code Playgroud)\n\n

和巴士站:

\n\n
     stop_name   stop_lat   stop_lon  stop_id                   geometry\n0  Ritarihuone  60.169460  24.956670  1010102  POINT (24.95667 60.16946)\n1   Kirkkokatu  60.171270  24.956570  1010103  POINT (24.95657 60.17127)\n2   Kirkkokatu  60.170293  24.956721  1010104  POINT (24.95672 60.17029)\n3    Vironkatu  60.172580  24.956554  1010105  POINT (24.95655 60.17258)\n
Run Code Online (Sandbox Code Playgroud)\n\n

申请后

\n\n
\n

sklearn.neighbors 导入 BallTree

\n
\n\n
from sklearn.neighbors import BallTree\nimport numpy as np\n\ndef get_nearest(src_points, candidates, k_neighbors=1):\n    """Find nearest neighbors for all source points from a set of candidate points"""\n\n    # Create tree from the candidate points\n    tree = BallTree(candidates, leaf_size=15, metric=\'haversine\')\n\n    # Find closest points and distances\n    distances, indices = tree.query(src_points, k=k_neighbors)\n\n    # Transpose to get distances and indices into arrays\n    distances = distances.transpose()\n    indices = indices.transpose()\n\n    # Get closest indices and distances (i.e. array at index 0)\n    # note: for the second closest points, you would take index 1, etc.\n    closest = indices[0]\n    closest_dist = distances[0]\n\n    # Return indices and distances\n    return (closest, closest_dist)\n\n\ndef nearest_neighbor(left_gdf, right_gdf, return_dist=False):\n    """\n    For each point in left_gdf, find closest point in right GeoDataFrame and return them.\n\n    NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).\n    """\n\n    left_geom_col = left_gdf.geometry.name\n    right_geom_col = right_gdf.geometry.name\n\n    # Ensure that index in right gdf is formed of sequential numbers\n    right = right_gdf.copy().reset_index(drop=True)\n\n    # Parse coordinates from points and insert them into a numpy array as RADIANS\n    left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())\n    right_radians = np.array(right[right_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())\n\n    # Find the nearest points\n    # -----------------------\n    # closest ==> index in right_gdf that corresponds to the closest point\n    # dist ==> distance between the nearest neighbors (in meters)\n\n    closest, dist = get_nearest(src_points=left_radians, candidates=right_radians)\n\n    # Return points from right GeoDataFrame that are closest to points in left GeoDataFrame\n    closest_points = right.loc[closest]\n\n    # Ensure that the index corresponds the one in left_gdf\n    closest_points = closest_points.reset_index(drop=True)\n\n    # Add distance if requested\n    if return_dist:\n        # Convert to meters from radians\n        earth_radius = 6371000  # meters\n        closest_points[\'distance\'] = dist * earth_radius\n\n            return closest_points\n\n\nclosest_stops = nearest_neighbor(buildings, stops, return_dist=True)\n
Run Code Online (Sandbox Code Playgroud)\n\n

我们为每个建筑物索引获取到最近公交车站的距离:

\n\n
    stop_name    stop_lat   stop_lon    stop_id                 geometry      distance\n0   Muusantori   60.207490  24.857450   1304138 POINT (24.85745 60.20749)   180.521584\n1   El\xc3\xa4intarha   60.192490  24.930840   1171120 POINT (24.93084 60.19249)   372.665221\n2   Senaatintori 60.169010  24.950460   1020450 POINT (24.95046 60.16901)   119.425777\n3   Veturitie    60.206610  24.929680   1174112 POINT (24.92968 60.20661)   106.762619\n
Run Code Online (Sandbox Code Playgroud)\n\n

我正在寻找解决方案,让每栋建筑的每个公交车站(可以不止一个)距离都在 250 米以下。

\n\n

谢谢你的帮助。

\n

Ben*_*n.T 5

这是一种重用 BallTree 所做的事情的方法,就像所讨论的query_radius那样。而且它不是函数格式,但您仍然可以轻松更改它

from sklearn.neighbors import BallTree
import numpy as np
import pandas as pd
## here I start with buildings and stops as loaded in the link provided

# variable in meter you can change
radius_max = 250 # meters
# another parameter, in case you want to do with Mars radius ^^
earth_radius = 6371000  # meters

# similar to the method with apply in the tutorial 
# to create left_radians and right_radians, but faster
candidates = np.vstack([stops['geometry'].x.to_numpy(), 
                        stops['geometry'].y.to_numpy()]).T*np.pi/180
src_points = np.vstack([buildings['geometry'].x.to_numpy(), 
                        buildings['geometry'].y.to_numpy()]).T*np.pi/180

# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='haversine')
# use query_radius instead
ind_radius, dist_radius = tree.query_radius(src_points, 
                                            r=radius_max/earth_radius, 
                                            return_distance=True)
Run Code Online (Sandbox Code Playgroud)

现在您可以操纵结果以获得您想要的结果

# create a dataframe build with
# index based on row position of the building in buildings
# column row_stop is the row position of the stop
# dist is the distance
closest_dist = pd.concat([pd.Series(ind_radius).explode().rename('row_stop'), 
                          pd.Series(dist_radius).explode().rename('dist')*earth_radius], 
                         axis=1)
print (closest_dist.head())
#  row_stop     dist
#0     1131  180.522
#1      NaN      NaN
#2       64  174.744
#2       61  119.426
#3      532  106.763

# merge the dataframe created above with the original data stops
# to get names, id, ... note: the index must be reset as in closest_dist
# it is position based
closest_stop = closest_dist.merge(stops.reset_index(drop=True), 
                                  left_on='row_stop', right_index=True, how='left')
print (closest_stop.head())
#  row_stop     dist     stop_name  stop_lat  stop_lon    stop_id  \
#0     1131  180.522    Muusantori  60.20749  24.85745  1304138.0   
#1      NaN      NaN           NaN       NaN       NaN        NaN   
#2       64  174.744  Senaatintori  60.16896  24.94983  1020455.0   
#2       61  119.426  Senaatintori  60.16901  24.95046  1020450.0   
#3      532  106.763     Veturitie  60.20661  24.92968  1174112.0   
#
#                    geometry  
#0  POINT (24.85745 60.20749)  
#1                       None  
#2  POINT (24.94983 60.16896)  
#2  POINT (24.95046 60.16901)  
#3  POINT (24.92968 60.20661) 
Run Code Online (Sandbox Code Playgroud)

最后回到建筑物

# join buildings with reset_index with 
# closest_stop as index in closest_stop are position based
final_df = buildings.reset_index(drop=True).join(closest_stop, rsuffix='_stop')
print (final_df.head(10))
#              name                   geometry row_stop     dist     stop_name  \
# 0            None  POINT (24.85584 60.20727)     1131  180.522    Muusantori   
# 1     Uimastadion  POINT (24.93045 60.18882)      NaN      NaN           NaN   
# 2            None  POINT (24.95113 60.16994)       64  174.744  Senaatintori   
# 2            None  POINT (24.95113 60.16994)       61  119.426  Senaatintori   
# 3  Hartwall Arena  POINT (24.92918 60.20570)      532  106.763     Veturitie   

#    stop_lat  stop_lon    stop_id              geometry_stop  
# 0  60.20749  24.85745  1304138.0  POINT (24.85745 60.20749)  
# 1       NaN       NaN        NaN                       None  
# 2  60.16896  24.94983  1020455.0  POINT (24.94983 60.16896)  
# 2  60.16901  24.95046  1020450.0  POINT (24.95046 60.16901)  
# 3  60.20661  24.92968  1174112.0  POINT (24.92968 60.20661)  
Run Code Online (Sandbox Code Playgroud)