PostgreSQL 地理空间查询很慢

nen*_*007 3 postgresql performance spatial postgis postgresql-10 postgresql-performance

在放弃 MySQL 之后,我尝试了 Elasticsearch,现在不想看看我是否可以使用 PostgreSQL/PostGIS,因为它可以让我只使用 PostgreSQL。

我需要按距离(不能完全相同)从表中获取记录并按距离排序。该表有 1000 万条记录。

当我在 PostgreSQL 上查询比在 MySQL 上的查询速度慢时,我想我一定做错了什么。

我可以做什么更好?

桌子:

id | hash_id | town | geo_pt2 

geo_pt2 is geography
Run Code Online (Sandbox Code Playgroud)

指数:

CREATE INDEX geo_pt2_gix ON public.member_profile USING gist (geo_pt2)
Run Code Online (Sandbox Code Playgroud)

询问:

SELECT hash_id, town
     , ST_Distance(t.x, geo_pt2) AS dist
FROM   member_profile, (SELECT ST_GeographyFromText('POINT(47.4667 8.3167)')) AS t(x)
WHERE  ST_DWithin(t.x, geo_pt2, 250000)
ORDER  BY dist
limit 100 offset 1000;
Run Code Online (Sandbox Code Playgroud)

解释:

Limit  (cost=9.08..9.08 rows=1 width=53)
  ->  Sort  (cost=9.07..9.08 rows=1 width=53)
        Sort Key: (_st_distance('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, member_profile.geo_pt2, '0'::double precision, true))
        ->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.06 rows=1 width=53)
              Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
              Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Run Code Online (Sandbox Code Playgroud)

我在具有高 IOPS (NVMe) 的现代服务器上使用 PostgreSQL 10,查询需要 35 秒。

在@Evan Carroll 提出更好的性能建议后:

EXPLAIN ANALYZE SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 10000
FETCH NEXT 100 ROWS ONLY;

Limit  (cost=9.31..18.21 rows=1 width=61) (actual time=392.608..394.138 rows=100 loops=1)
  ->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.31 rows=1 width=61) (actual time=26.624..392.776 rows=10100 loops=1)
        Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
        Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
        Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 89.020 ms
Execution time: 395.039 ms
Run Code Online (Sandbox Code Playgroud)

如果用户分页结束它会变慢:

EXPLAIN ANALYZE SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000000
FETCH NEXT 100 ROWS ONLY;

Limit  (cost=9.31..18.21 rows=1 width=61) (actual time=28872.156..28873.239 rows=100 loops=1)

->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.31 rows=1 width=61) (actual time=32.441..28764.569 rows=1000100 loops=1)
    Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
    Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
    Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 50.979 ms
Execution time: 28875.403 ms
Run Code Online (Sandbox Code Playgroud)

Eva*_*oll 5

首先,使用EXPLAIN ANALYZE(不仅仅是EXPLAIN)并\d在表格上显示 的结果。(psql)。作为第一点,

ST_GeographyFromText('POINT(47.4667 8.3167)')
Run Code Online (Sandbox Code Playgroud)

应该写成 ST_MakePoint(47.4667, 8.3167)::geography

你的问题是这种模式,

SELECT ST_Distance( ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
...
ORDER  BY dist
LIMIT 100 OFFSET 1000;
Run Code Online (Sandbox Code Playgroud)

每次这样做时,您必须至少计算到 1100 行的距离。也就是说,它不应该很慢。就像你曾经有计算它很慢,因为为了这ST_Distance所有行。我们可以使用<->运算符在那里使用 KNN 来停止它。MySQL 不支持 KNN

SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000
FETCH NEXT 100 ROWS ONLY;
Run Code Online (Sandbox Code Playgroud)

作为风格评论,我个人更喜欢OFFSET/FETCH(标准化方法限制/偏移)。

分页

我不确定这会起作用。但是,它可能值得一试(让我们保持更新)。

SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
     , ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 AS myknn
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
  AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
FETCH NEXT 100 ROWS ONLY;
Run Code Online (Sandbox Code Playgroud)

因此,第一次运行它时,您保存了最后一个值的值,myknn然后第二次运行它时,您可以在此子句中将该值重播为OLD_VALUE

AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE
Run Code Online (Sandbox Code Playgroud)

因此,每次运行它时,您都在保存要继续的新点,并使用FETCH NEXT x ROWS ONLY.

myknn并且dist对您来说可能是一样的,如果是这样,您可以删除其中之一。