如何让Postgres多列多表搜索更高效

dan*_*l9x 3 postgresql performance index index-tuning postgresql-10

我有一个Shipment表,其中包含有关货件的一些基本数据,还有一个ShipmentItem表,其中包含有关该货件的附加属性以及foreignKeyShipment的主键。到ShipmentShipmentItemOneToMany关系。

我们需要包含一个文本搜索选项,该选项采用给定的输入文本字符串,Shipment除了三个特定typesShipmentItem名称列之外,还搜索 (make) 的超过 2 个列。这是我当前的查询:

select *
from Shipment shipment
where shipment.deliveryRequestedDate >= '2019-06-09T00:00:00Z'
  and shipment.deliveryRequestedDate <= '2019-12-06T23:59:59Z'
  and (
        shipment.identifierkeyvalues = '12345'
        or shipment.carrierReferenceNumber = '12345'
        or shipment.uuid in (
            select shipmentItem.resultId
            from ShipmentItem shipmentItem
            where (
                shipmentItem.type in (
                                      'poNumber', 'deliveryNoteNumber', 'salesOrderNumber'
                )
            )
            and shipmentItem.name = '12345'
            and shipmentItem.deliveryRequestedDate >= '2019-06-09T00:00:00Z'
            and shipmentItem.deliveryRequestedDate <= '2019-12-06T23:59:59Z'
       )
    )
limit 25
Run Code Online (Sandbox Code Playgroud)

我发现的问题是将子查询作为条件之一的组合or会导致重大性能问题(即使子查询本身通过利用type_name_deliveryRequestedDate该表上的索引来快速返回。尽管我们在主表上有多个索引(identifierKeyValues,,,carrierReferenceNumber甚至查询所有三个 Shipment 列的索引,它只会使用deliveryRequestedDate效率极低的索引,因为该查询的范围太大了。

将其转换为 JOIN 似乎会导致相同的行为。我只是不太确定目前最好的方法是什么。我们在此查询之上有一个 Java Persistence API 层,因此希望尽可能避免对数据模型进行任何重大更改,但不确定最好的方法是什么。任何想法将不胜感激!

解释计划:

Limit  (cost=110.61..209.98 rows=25 width=1370) (actual time=119503.030..124034.809 rows=1 loops=1)
      ->  Index Scan using shipment_deliveryrequesteddate_idx on shipment shipment  (cost=110.61..890840.18 rows=224084 width=1370) (actual time=119503.027..124034.805 rows=1 loops=1)
            Index Cond: ((deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
            Filter: ((identifierkeyvalues = '12345'::text) OR (carrierreferencenumber = '12345'::text) OR (hashed SubPlan 1))
            Rows Removed by Filter: 496784
            SubPlan 1
              ->  Index Scan using "type_name_deliveryRequestedDate" on resultitem shipmentitem  (cost=0.56..110.11 rows=24 width=16) (actual time=10.706..16.416 rows=1 loops=1)
                    Index Cond: ((type = ANY ('{poNumber,deliveryNoteNumber,salesOrderNumber}'::text[])) AND (name = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
    Planning time: 3.175 ms
    Execution time: 124035.006 ms
Run Code Online (Sandbox Code Playgroud)

EXPLAIN PLAN 删除子查询——为什么它使用完全不同的索引?

Limit  (cost=9.51..273.71 rows=6 width=1370) (actual time=0.052..0.053 rows=0 loops=1)
  ->  Bitmap Heap Scan on shipment shipment  (cost=9.51..273.71 rows=6 width=1370) (actual time=0.051..0.051 rows=0 loops=1)
        Recheck Cond: (((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone)) OR (carrierreferencenumber = '12345'::text))
        Filter: ((deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
        Rows Removed by Filter: 2
        Heap Blocks: exact=2
        ->  BitmapOr  (cost=9.51..9.51 rows=66 width=0) (actual time=0.041..0.041 rows=0 loops=1)
              ->  Bitmap Index Scan on shipment_identifierkeyvalues_idx  (cost=0.00..4.61 rows=4 width=0) (actual time=0.023..0.024 rows=0 loops=1)
                    Index Cond: ((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
              ->  Bitmap Index Scan on shipment_carrierreferencenumber_idx  (cost=0.00..4.90 rows=62 width=0) (actual time=0.016..0.016 rows=2 loops=1)
                    Index Cond: (carrierreferencenumber = '12345'::text)
Planning time: 1.668 ms
Execution time: 0.116 ms
Run Code Online (Sandbox Code Playgroud)

jja*_*nes 5

它不能使用 BitmapOr 来对抗不同表上的扫描(或者至少,它没有被编码为能够做到这一点——如果有人投入工作,也许可以做到这一点——它会必须在另一个表中查找UUID,然后将它们转换为ipso表上的tids并将它们填充到位图中),因此它不能使用BitmapOr计划。

最好的选择可能是将其编写为两个不同查询的 UNION ALL,一个查询仅命中单个表,另一个查询同时命中两个表。