如何使用 PostgreSQL 数组简化嵌套的 SELECT?

6 postgresql

我正在尝试加快和简化针对导入的 OpenStreetMap 数据库 (OSM) 的 SQL 查询。该数据库存储在PostgreSQL 9.2.4服务器中。

此 OSM 导入具有两个特定表,planet_osm_rels以及planet_osm_ways. 在第一个表中存在国家边界的关系,我可以通过查询 hstore 列来提取它tags_hstore。结果members属性然后包含一个文本数组,给我一堆信息,包括哪些方式是这种关系的一部分。特定路的 ID 以 前缀w表示它是路 ID,例如w23412。要获取路径的实际节点,我需要planet_osm_ways使用我获得的 ID查询表w,当然,减去。

总而言之,我有以下表结构:

   Table "public.planet_osm_rels"
   Column    |   Type   | Modifiers 
-------------+----------+-----------
 id          | bigint   | not null
 way_off     | smallint | 
 rel_off     | smallint | 
 parts       | bigint[] | 
 members     | text[]   | 
 tags        | text[]   | 
 pending     | boolean  | not null
 tags_hstore | hstore   | 
Indexes:
    "planet_osm_rels_pkey" PRIMARY KEY, btree (id)
    "planet_osm_rels_idx" btree (id) WHERE pending
    "planet_osm_rels_parts" gin (parts) WITH (fastupdate=off)
    "planet_osm_rels_tags_hstore_idx" gin (tags_hstore)

   Table "public.planet_osm_ways"
   Column    |   Type   | Modifiers 
-------------+----------+-----------
 id          | bigint   | not null
 nodes       | bigint[] | not null
 tags        | text[]   | 
 pending     | boolean  | not null
 tags_hstore | hstore   | 
Indexes:
    "planet_osm_ways_pkey" PRIMARY KEY, btree (id)
    "planet_osm_ways_idx" btree (id) WHERE pending
    "planet_osm_ways_nodes" gin (nodes) WITH (fastupdate=off)
Run Code Online (Sandbox Code Playgroud)

我想出了以下查询:

SELECT  nodes
FROM    planet_osm_ways
WHERE   id IN (
      SELECT    trim(leading 'w' from unnest)::int
      FROM (
        SELECT  unnest(members)
        FROM    planet_osm_rels
        WHERE   (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", "name:en"=>"Germany"'))
      AS        unnest
      WHERE     unnest LIKE 'w%');
Run Code Online (Sandbox Code Playgroud)

奇怪的是,查询速度很慢。我知道我可以 (a)members通过提供一个链接表来消除该列并加入更多索引。但是,我还想优化查询本身并至少删除一个子查询,因为查询计划非常复杂:

                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=299957.16..300008.23 rows=90957940 width=8)
   ->  HashAggregate  (cost=299957.15..299957.16 rows=1 width=32)
         ->  Subquery Scan on unnest  (cost=0.00..299954.76 rows=956 width=32)
               Filter: (unnest.unnest ~~ 'w%'::text)
               ->  Seq Scan on planet_osm_rels  (cost=0.00..297563.51 rows=191300 width=180)
                     Filter: ((tags)::hstore @> '"type"=>"boundary", "name:en"=>"Germany", "admin_level"=>"2"'::hstore)
   ->  Index Only Scan using planet_osm_ways_pkey on planet_osm_ways  (cost=0.01..51.06 rows=1 width=8)
         Index Cond: (id = (ltrim(unnest.unnest, 'w'::text))::integer)
(8 rows)
Run Code Online (Sandbox Code Playgroud)

EXPLAIN ANALYZE

                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=299957.16..299980.93 rows=39090200 width=1147) (actual time=18680.342..36216.571 rows=1266 loops=1)
   ->  HashAggregate  (cost=299957.15..299957.16 rows=1 width=32) (actual time=18606.686..18608.105 rows=1259 loops=1)
         ->  Subquery Scan on unnest  (cost=0.00..299954.76 rows=956 width=32) (actual time=468.391..18606.233 rows=1259 loops=1)
               Filter: (unnest.unnest ~~ 'w%'::text)
               Rows Removed by Filter: 1283
               ->  Seq Scan on planet_osm_rels  (cost=0.00..297563.51 rows=191300 width=180) (actual time=468.376..18605.288 rows=2542 loops=1)
                     Filter: ((tags)::hstore @> '"type"=>"boundary", "name:en"=>"Germany", "admin_level"=>"2"'::hstore)
                     Rows Removed by Filter: 1912651
   ->  Index Scan using planet_osm_line_pkey on planet_osm_line  (cost=0.01..23.73 rows=3 width=1155) (actual time=13.926..13.978 rows=1 loops=1259)
         Index Cond: (osm_id = (ltrim(unnest.unnest, 'w'::text))::bigint)
 Total runtime: 36217.277 ms
Run Code Online (Sandbox Code Playgroud)

返回的行数不足以解释长时间的运行时间:

 count 
-------
  1266
Run Code Online (Sandbox Code Playgroud)

我无法使用,SELECT unnest(members) AS unnested .... WHERE unnested LIKE 'w%'因为 WHERE 子句不知道“非嵌套”部分。有没有更好的方法来做到这一点?

Erw*_*ter 6

IN大集合的查询是出了名的慢。使用 aJOIN代替通常更快:

SELECT nodes
FROM   planet_osm_ways
JOIN   (
   SELECT ltrim(member, 'w')::bigint AS id
   FROM  (
      SELECT unnest(members) AS member
      FROM   planet_osm_rels
      WHERE  (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", ...')
      ) u
   WHERE member LIKE 'w%'
   ) x USING (id);
Run Code Online (Sandbox Code Playgroud)

但这不是这里最重要的问题。我想知道为什么planet_osm_rels_tags_hstore_idx没有使用GIN 索引。您是否选择了足够大的表部分planet_osm_rels来证明顺序扫描的合理性?

哦,id是 type bigint。所以投到bigint而不是int为了减少摩擦。

如果您可以提取“路 ID”并将它们冗余地保存way_ids bigint[]在表中的单独列中,那么您的查询将变得更加简单和快速,并且子查询级别更少:

SELECT nodes
FROM   planet_osm_ways
JOIN   (
   SELECT unnest(way_ids) AS id
   FROM   planet_osm_rels
   WHERE  (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", ...')
   ) u USING (id);
Run Code Online (Sandbox Code Playgroud)