在 Geopandas 中过滤和合并特定距离内的两个数据帧中的点

ahb*_*bon 0 python geometry pandas geopandas

对于两个 GeoPandas 数据框如下:

df1:

     id  sMiddleLng  sMiddleLat  p1_sum  p2_sum  \
0  325782  109.255034   34.691754     0.0     0.0   
1   84867  107.957177   33.958289     0.0     0.0   
2   13101  107.835338   33.739493     0.0     0.0   
3   92771  109.464280   33.980666     0.0     0.0   
4   86609  108.253830   33.963262     0.0     0.0   

                            geometry  
0  POINT (109.255033915 34.69175367)  
1  POINT (107.957177305 33.95828929)  
2    POINT (107.8353377 33.73949313)  
3   POINT (109.46428019 33.98066616)  
4  POINT (108.253830245 33.96326193)  
Run Code Online (Sandbox Code Playgroud)

df2:

     fnid  sMiddleLng  sMiddleLat  p1_sum  p2_sum  \
0  361104  102.677887   36.686408     0.0     0.0   
1  276307  103.268356   36.425372     0.0     0.0   
2  334778  103.242125   36.605224     0.0     0.0   
3  205223  104.186869   36.206637     0.0     0.0   
4  167892  104.387566   36.091905     0.0     0.0   

                                 geometry  
0  POINT (102.67788654685 36.68640780045)  
1  POINT (103.26835590025 36.42537187675)  
2   POINT (103.2421246007 36.60522388845)  
3    POINT (104.1868687253 36.2066370049)  
4   POINT (104.38756565315 36.0919047206)  
Run Code Online (Sandbox Code Playgroud)

如何从另一个类似的 Geodataframe 中找到并合并基于 和 的所有点df2,这些df1点与和中的点之间的距离较小?谢谢。idgeometry10 kmdf1df2

计算距离的函数:

from math import radians, cos, sin, asin, sqrt

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r
Run Code Online (Sandbox Code Playgroud)

bex*_*exi 5

我的建议遵循以下逻辑:

  1. 重新投影两者GeoDataFrame到使用米作为单位的投影,例如 WebMercator
  2. 临时在其中一个数据集中的点周围创建 10 公里的缓冲区
  3. 使用 sjoin 查找/合并重叠点

这可以按如下方式实施:

# Assuming your data uses WGS84 projection. only use the following line if crs has not been initialised
df1.crs = df2.crs = {'init': 'epsg:4326'} 

# Now convert the Dataframes to WebMercator
df2 = df2.to_crs({'init': 'epsg:3857'})
df1 = df1.to_crs({'init': 'epsg:3857'})

# Create a buffer with a radius of 10000 meters around each point in df2
df2.geometry = df2.geometry.buffer(10000)

# Join the two Dataframes and convert back to original projection
df3 = gpd.sjoin(df1, df2, how='left', op='intersects', lsuffix='_df1', rsuffix='_df2')
df3.to_crs({'init': 'epsg:4326'}) # or whatever was used originally
Run Code Online (Sandbox Code Playgroud)

现在您可以轻松掌握有关连接点的信息GeoDataFrame。对于给定的数据,在df2中某个点的 10 公里范围内没有点df1

另外,我不完全确定您想要以什么形式合并数据,因此只需相应地适应您的需求即可。

  • 您好,错误消息表明您的 dfs 是常规的“pandas”“DataFrame”。将它们转换为“geopandas”“GeoDataFrame”,如下所示:“df1 = gpd.GeoDataFrame(df1)”,其中“gpd”是“geopandas”包。请注意,“geometry”列中的 Points 必须是“shapely”Point 对象(有时,当您从其他地方导入数据时,它们只是文本字符串)。如果您需要有关如何转换它们的提示,请告诉我。 (2认同)