PostgreSQL 查询成本高

Question

PostgreSQL 查询成本高

Tom*_*Tom 1 postgresql postgresql-12 postgresql-performance

我有一个包含超过 10.000.000 条记录的表，并且我正在创建一个返回大约 4436 条记录的查询。

碰巧它给我的印象是获取最后一条记录的查询成本非常高。

Index Scan using idx_name on task  (cost=0.28..142102.57 rows=3470 width=34) (actual time=14.690..22.894 rows=4436 loops=1)
"  Index Cond: ((situation = ANY ('{0,1,2,3,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}'::integer[])) AND (deadline < CURRENT_TIMESTAMP))"
Planning Time: 1.335 ms
JIT:
  Functions: 5
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 1.654 ms, Inlining 0.000 ms, Optimization 1.214 ms, Emission 13.163 ms, Total 16.030 ms
Execution Time: 24.758 ms

Run Code Online (Sandbox Code Playgroud)

这个成本水平是否可以接受，或者这个指标是否需要改进？

指数：

CREATE INDEX idx_name ON task (situation, deadline, approved)
WHERE
deadline IS NOT NULL AND
situation <> ALL ('{4,5}'::integer[]) AND
approved = 'N';

Run Code Online (Sandbox Code Playgroud)

我的查询：

SELECT
        task.deadline,
        task.id
    FROM
        task
    WHERE
        task.deadline IS NOT NULL
        AND task.situation IN ('0', '1', '2', '3', '6' ,'7' ,'8','9','10','11','12','13','14','15','16','17','18','19','20')
        AND task.situation NOT IN ('4', '5')
        AND task.deadline < CURRENT_TIMESTAMP
        AND task.approved = 'N';

Run Code Online (Sandbox Code Playgroud)

Answer 1

mus*_*cio 7

正如评论中所暗示的那样，您不应该查看查询成本，而应该查看其实际运行时间（或者如果运行时间可以接受，您根本不应该被其中任何一个所困扰）。除了与其他可能的计划相比， Postgres 估计执行此特定计划可能需要花费的各种资源的相对数量之外，估计的计划成本并不表示任何内容。

看看绝对成本值绝对不能告诉你什么；根据优化器拥有的信息，将其与其他计划的成本进行比较，您可以知道 Postgres 认为哪个计划更有效。

另请参阅此问答。

观察马嘴，人们可能会看到这一点：

/*------------------------------------------------------------------------- * * costsize.c * Routines to compute (and set) relation sizes and path costs * * Path costs are measured in arbitrary units established by these basic * parameters: * * seq_page_cost Cost of a sequential page fetch * random_page_cost Cost of a non-sequential page fetch * cpu_tuple_cost Cost of typical CPU time to process a tuple * cpu_index_tuple_cost Cost of typical CPU time to process an index tuple * cpu_operator_cost Cost of CPU time to execute an operator or function * parallel_tuple_cost Cost of CPU time to pass a tuple from worker to leader backend * parallel_setup_cost Cost of setting up shared memory for parallelism ...
Run Code Online (Sandbox Code Playgroud)
并沿着这条路进一步走下去：

# - Planner Cost Constants - #seq_page_cost = 1.0 # measured on an arbitrary scale #random_page_cost = 4.0 # same scale as above #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above #cpu_operator_cost = 0.0025 # same scale as above
Run Code Online (Sandbox Code Playgroud)
你会注意到他们强调“任意尺度”上的“任意单位”；他们想要确定的是，随机读取 N 页的资源密集型（“昂贵”）是顺序读取那么多页的四倍，或者针对索引条目评估谓词的成本是针对表执行谓词的一半排。当整个计划树的成本相加时，您得到的值只能与另一棵树的成本进行比较；只能高或低，不能高或低。

感谢jjanes在评论中提到JIT 。在您的情况下，估计的查询成本恰好超过了 JIT 触发器阈值，该阈值默认为 100000，并且正如 jjanes 敏锐地指出的那样，“[您的查询]时间的 2/3 似乎花费在 JIT 上”，这可能是适得其反。您可能想要评估在您的环境中启用 JIT 是否有用。

归档时间：	4 年前
查看次数：	4227 次
最近记录：	4 年前