nen*_*007 3 postgresql performance spatial postgis postgresql-10 postgresql-performance
在放弃 MySQL 之后,我尝试了 Elasticsearch,现在不想看看我是否可以使用 PostgreSQL/PostGIS,因为它可以让我只使用 PostgreSQL。
我需要按距离(不能完全相同)从表中获取记录并按距离排序。该表有 1000 万条记录。
当我在 PostgreSQL 上查询比在 MySQL 上的查询速度慢时,我想我一定做错了什么。
我可以做什么更好?
桌子:
id | hash_id | town | geo_pt2
geo_pt2 is geography
Run Code Online (Sandbox Code Playgroud)
指数:
CREATE INDEX geo_pt2_gix ON public.member_profile USING gist (geo_pt2)
Run Code Online (Sandbox Code Playgroud)
询问:
SELECT hash_id, town
, ST_Distance(t.x, geo_pt2) AS dist
FROM member_profile, (SELECT ST_GeographyFromText('POINT(47.4667 8.3167)')) AS t(x)
WHERE ST_DWithin(t.x, geo_pt2, 250000)
ORDER BY dist
limit 100 offset 1000;
Run Code Online (Sandbox Code Playgroud)
解释:
Limit (cost=9.08..9.08 rows=1 width=53)
-> Sort (cost=9.07..9.08 rows=1 width=53)
Sort Key: (_st_distance('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, member_profile.geo_pt2, '0'::double precision, true))
-> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.06 rows=1 width=53)
Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Run Code Online (Sandbox Code Playgroud)
我在具有高 IOPS (NVMe) 的现代服务器上使用 PostgreSQL 10,查询需要 35 秒。
在@Evan Carroll 提出更好的性能建议后:
EXPLAIN ANALYZE SELECT hash_id, town
, ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 10000
FETCH NEXT 100 ROWS ONLY;
Limit (cost=9.31..18.21 rows=1 width=61) (actual time=392.608..394.138 rows=100 loops=1)
-> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.31 rows=1 width=61) (actual time=26.624..392.776 rows=10100 loops=1)
Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 89.020 ms
Execution time: 395.039 ms
Run Code Online (Sandbox Code Playgroud)
如果用户分页结束它会变慢:
EXPLAIN ANALYZE SELECT hash_id, town
, ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000000
FETCH NEXT 100 ROWS ONLY;
Limit (cost=9.31..18.21 rows=1 width=61) (actual time=28872.156..28873.239 rows=100 loops=1)
-> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.31 rows=1 width=61) (actual time=32.441..28764.569 rows=1000100 loops=1)
Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 50.979 ms
Execution time: 28875.403 ms
Run Code Online (Sandbox Code Playgroud)
首先,使用EXPLAIN ANALYZE
(不仅仅是EXPLAIN
)并\d
在表格上显示 的结果。(psql)。作为第一点,
ST_GeographyFromText('POINT(47.4667 8.3167)')
Run Code Online (Sandbox Code Playgroud)
应该写成 ST_MakePoint(47.4667, 8.3167)::geography
你的问题是这种模式,
SELECT ST_Distance( ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
...
ORDER BY dist
LIMIT 100 OFFSET 1000;
Run Code Online (Sandbox Code Playgroud)
每次这样做时,您必须至少计算到 1100 行的距离。也就是说,它不应该很慢。就像你曾经有计算它很慢,因为为了这ST_Distance
对所有行。我们可以使用<->
运算符在那里使用 KNN 来停止它。MySQL 不支持 KNN。
SELECT hash_id, town
, ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000
FETCH NEXT 100 ROWS ONLY;
Run Code Online (Sandbox Code Playgroud)
作为风格评论,我个人更喜欢OFFSET/FETCH(标准化方法限制/偏移)。
我不确定这会起作用。但是,它可能值得一试(让我们保持更新)。
SELECT hash_id, town
, ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
, ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 AS myknn
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
FETCH NEXT 100 ROWS ONLY;
Run Code Online (Sandbox Code Playgroud)
因此,第一次运行它时,您保存了最后一个值的值,myknn
然后第二次运行它时,您可以在此子句中将该值重播为OLD_VALUE
,
AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE
Run Code Online (Sandbox Code Playgroud)
因此,每次运行它时,您都在保存要继续的新点,并使用FETCH NEXT x ROWS ONLY
.
myknn
并且dist
对您来说可能是一样的,如果是这样,您可以删除其中之一。