Ell*_*Rob 5 pandas geopandas pandas-groupby
我有一个 geopandas 数据框,其中包含一系列匀称的 POINT 几何图形。还有另一列带有 ID 列表,用于指定每个点所属的唯一多边形。简化的输入代码是:
import pandas as pd
from shapely.geometry import Point, LineString, Polygon
from geopandas import GeoDataFrame
data = [[1,10,10],[1,15,20],[1,20,10],[2,30,30],[2,35,40],[2,40,30]]
df_poly = pd.DataFrame(data, columns = ['poly_ID','lon', 'lat'])
geometry = [Point(xy) for xy in zip(df_poly.lon, df_poly.lat)]
geodf_poly = GeoDataFrame(df_poly, geometry=geometry)
geodf_poly.head()
Run Code Online (Sandbox Code Playgroud)
我想对 poly_ID 进行分组,以便将几何图形从 POINT 转换为 POLYGON。此输出基本上如下所示:
poly_ID geometry
1 POLYGON ((10 10, 15 20, 20 10))
2 POLYGON ((30 30, 35 40, 40 30))
Run Code Online (Sandbox Code Playgroud)
我想这很简单,但我无法让它工作。我发现以下代码允许我将其转换为开放式多段线,但无法弄清楚多边形。谁能建议如何适应这个?
geodf_poly = geodf_poly.groupby(['poly_ID'])['geometry'].apply(lambda x: LineString(x.tolist()))
Run Code Online (Sandbox Code Playgroud)
简单地用 Polygon 替换 LineString 导致 TypeError: object of type 'Point' has no len()
在 Pandas 中完成您的请求有点棘手,因为在您的输出中您需要文本“POLYGON”但括号内包含数字。
查看以下选项适合您
from itertools import chain
df_poly.groupby('poly_ID').agg(list).apply(lambda x: tuple(chain.from_iterable(zip(x['lon'], x['lat']))), axis=1).reset_index(name='geometry')
Run Code Online (Sandbox Code Playgroud)
输出
poly_ID geometry
0 1 (10, 10, 15, 20, 20, 10)
1 2 (30, 30, 35, 40, 40, 30)
Run Code Online (Sandbox Code Playgroud)
或者
from itertools import chain
df_new =df_poly.groupby('poly_ID').agg(list).apply(lambda x: tuple(chain.from_iterable(zip(x['lon'], x['lat']))), axis=1).reset_index(name='geometry')
df_new['geometry']=df_new.apply(lambda x: 'POLYGON ('+str(x['geometry'])+')',axis=1 )
df_new
Run Code Online (Sandbox Code Playgroud)
输出
poly_ID geometry
0 1 POLYGON ((10, 10, 15, 20, 20, 10))
1 2 POLYGON ((30, 30, 35, 40, 40, 30))
Run Code Online (Sandbox Code Playgroud)
注意:列geometry
是一个字符串,我不确定您是否可以将其直接输入Shapely