我正在尝试加快和简化针对导入的 OpenStreetMap 数据库 (OSM) 的 SQL 查询。该数据库存储在PostgreSQL 9.2.4服务器中。
此 OSM 导入具有两个特定表,planet_osm_rels
以及planet_osm_ways
. 在第一个表中存在国家边界的关系,我可以通过查询 hstore 列来提取它tags_hstore
。结果members
属性然后包含一个文本数组,给我一堆信息,包括哪些方式是这种关系的一部分。特定路的 ID 以 前缀w
表示它是路 ID,例如w23412
。要获取路径的实际节点,我需要planet_osm_ways
使用我获得的 ID查询表w
,当然,减去。
总而言之,我有以下表结构:
Table "public.planet_osm_rels"
Column | Type | Modifiers
-------------+----------+-----------
id | bigint | not null
way_off | smallint |
rel_off | smallint |
parts | bigint[] |
members | text[] |
tags | text[] |
pending | boolean | not null
tags_hstore | hstore |
Indexes:
"planet_osm_rels_pkey" PRIMARY KEY, btree (id)
"planet_osm_rels_idx" btree (id) WHERE pending
"planet_osm_rels_parts" gin (parts) WITH (fastupdate=off)
"planet_osm_rels_tags_hstore_idx" gin (tags_hstore)
Table "public.planet_osm_ways"
Column | Type | Modifiers
-------------+----------+-----------
id | bigint | not null
nodes | bigint[] | not null
tags | text[] |
pending | boolean | not null
tags_hstore | hstore |
Indexes:
"planet_osm_ways_pkey" PRIMARY KEY, btree (id)
"planet_osm_ways_idx" btree (id) WHERE pending
"planet_osm_ways_nodes" gin (nodes) WITH (fastupdate=off)
Run Code Online (Sandbox Code Playgroud)
我想出了以下查询:
SELECT nodes
FROM planet_osm_ways
WHERE id IN (
SELECT trim(leading 'w' from unnest)::int
FROM (
SELECT unnest(members)
FROM planet_osm_rels
WHERE (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", "name:en"=>"Germany"'))
AS unnest
WHERE unnest LIKE 'w%');
Run Code Online (Sandbox Code Playgroud)
奇怪的是,查询速度很慢。我知道我可以 (a)members
通过提供一个链接表来消除该列并加入更多索引。但是,我还想优化查询本身并至少删除一个子查询,因为查询计划非常复杂:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=299957.16..300008.23 rows=90957940 width=8)
-> HashAggregate (cost=299957.15..299957.16 rows=1 width=32)
-> Subquery Scan on unnest (cost=0.00..299954.76 rows=956 width=32)
Filter: (unnest.unnest ~~ 'w%'::text)
-> Seq Scan on planet_osm_rels (cost=0.00..297563.51 rows=191300 width=180)
Filter: ((tags)::hstore @> '"type"=>"boundary", "name:en"=>"Germany", "admin_level"=>"2"'::hstore)
-> Index Only Scan using planet_osm_ways_pkey on planet_osm_ways (cost=0.01..51.06 rows=1 width=8)
Index Cond: (id = (ltrim(unnest.unnest, 'w'::text))::integer)
(8 rows)
Run Code Online (Sandbox Code Playgroud)
和EXPLAIN ANALYZE
:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=299957.16..299980.93 rows=39090200 width=1147) (actual time=18680.342..36216.571 rows=1266 loops=1)
-> HashAggregate (cost=299957.15..299957.16 rows=1 width=32) (actual time=18606.686..18608.105 rows=1259 loops=1)
-> Subquery Scan on unnest (cost=0.00..299954.76 rows=956 width=32) (actual time=468.391..18606.233 rows=1259 loops=1)
Filter: (unnest.unnest ~~ 'w%'::text)
Rows Removed by Filter: 1283
-> Seq Scan on planet_osm_rels (cost=0.00..297563.51 rows=191300 width=180) (actual time=468.376..18605.288 rows=2542 loops=1)
Filter: ((tags)::hstore @> '"type"=>"boundary", "name:en"=>"Germany", "admin_level"=>"2"'::hstore)
Rows Removed by Filter: 1912651
-> Index Scan using planet_osm_line_pkey on planet_osm_line (cost=0.01..23.73 rows=3 width=1155) (actual time=13.926..13.978 rows=1 loops=1259)
Index Cond: (osm_id = (ltrim(unnest.unnest, 'w'::text))::bigint)
Total runtime: 36217.277 ms
Run Code Online (Sandbox Code Playgroud)
返回的行数不足以解释长时间的运行时间:
count
-------
1266
Run Code Online (Sandbox Code Playgroud)
我无法使用,SELECT unnest(members) AS unnested .... WHERE unnested LIKE 'w%'
因为 WHERE 子句不知道“非嵌套”部分。有没有更好的方法来做到这一点?
IN
大集合的查询是出了名的慢。使用 aJOIN
代替通常更快:
SELECT nodes
FROM planet_osm_ways
JOIN (
SELECT ltrim(member, 'w')::bigint AS id
FROM (
SELECT unnest(members) AS member
FROM planet_osm_rels
WHERE (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", ...')
) u
WHERE member LIKE 'w%'
) x USING (id);
Run Code Online (Sandbox Code Playgroud)
但这不是这里最重要的问题。我想知道为什么planet_osm_rels_tags_hstore_idx
没有使用GIN 索引。您是否选择了足够大的表部分planet_osm_rels
来证明顺序扫描的合理性?
哦,id
是 type bigint
。所以投到bigint
而不是int
为了减少摩擦。
如果您可以提取“路 ID”并将它们冗余地保存way_ids bigint[]
在表中的单独列中,那么您的查询将变得更加简单和快速,并且子查询级别更少:
SELECT nodes
FROM planet_osm_ways
JOIN (
SELECT unnest(way_ids) AS id
FROM planet_osm_rels
WHERE (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", ...')
) u USING (id);
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
15240 次 |
最近记录: |