Mar*_*icz 5 sql postgresql database-performance
我遇到了 Postgresql(版本 9.4)性能难题。我有一个函数 ( prevd) 声明为STABLE(见下文)。当我在子句中的常量上运行此函数时where,它会被调用多次 - 而不是一次。如果我正确理解 postgres 文档,则应该优化查询以prevd仅调用一次。
STABLE 函数无法修改数据库,并且保证在给定单个语句中所有行的相同参数的情况下返回相同的结果
为什么在这种情况下它不优化调用prevd?我不希望对使用同一参数的prevd所有后续查询调用一次(就像它是 IMMUTABLE 一样)。prevd我期望 postgres 只需一次调用即可为我的查询创建一个计划prevd('2015-12-12')
请找到下面的代码:
模式
create table somedata(d date, number double precision);
create table dates(d date);
insert into dates
select generate_series::date
from generate_series('2015-01-01'::date, '2015-12-31'::date, '1 day');
insert into somedata
select '2015-01-01'::date + (random() * 365 + 1)::integer, random()
from generate_series(1, 100000);
create or replace function prevd(date_ date)
returns date
language sql
stable
as $$
select max(d) from dates where d < date_;
$$
Run Code Online (Sandbox Code Playgroud)
慢查询
select avg(number) from somedata where d=prevd('2015-12-12');
Run Code Online (Sandbox Code Playgroud)
上面的查询的查询计划不佳
Aggregate (cost=28092.74..28092.75 rows=1 width=8) (actual time=3532.638..3532.638 rows=1 loops=1)
Output: avg(number)
-> Seq Scan on public.somedata (cost=0.00..28091.43 rows=525 width=8) (actual time=10.210..3532.576 rows=282 loops=1)
Output: d, number
Filter: (somedata.d = prevd('2015-12-12'::date))
Rows Removed by Filter: 99718
Planning time: 1.144 ms
Execution time: 3532.688 ms
(8 rows)
Run Code Online (Sandbox Code Playgroud)
表现
上面的查询在我的机器上运行大约 3.5 秒。更改prevd为IMMUTABLE后,更改为0.035s。
I started writing this as a comment, but it got a bit long, so I'm expanding it into an answer.
As discussed in this previous answer, Postgres does not promise to always optimise based on STABLE or IMMUTABLE annotations, only that it can sometimes do so. It does this by planning the query differently by taking advantage of certain assumptions. This part of the previous answer is directly analogous to your case:
This particular sort of rewriting depends upon immutability or stability. With
where test_multi_calls1(30) != numquery re-writing will happen forimmutablebut not for merelystablefunctions.
If you change the function to IMMUTABLE and look at the query plan, you will see that the rewriting it does is really rather radical:
Seq Scan on public.somedata (cost=0.00..1791.00 rows=272 width=12) (actual time=0.036..14.549 rows=270 loops=1)
Output: d, number
Filter: (somedata.d = '2015-12-11'::date)
Buffers: shared read=541 written=14
Total runtime: 14.589 ms
Run Code Online (Sandbox Code Playgroud)
It actually runs the function while planning the query, and substitutes the value before the query is even executed. With a STABLE function, this optimisation would clearly not be appropriate - the data might change between planning and executing the query.
在评论中,有人提到此查询会产生优化计划:
select avg(number) from somedata where d=(select prevd(date '2015-12-12'));
Run Code Online (Sandbox Code Playgroud)
这很快,但请注意,该计划看起来与该IMMUTABLE版本完全不同:
Aggregate (cost=1791.69..1791.70 rows=1 width=8) (actual time=14.670..14.670 rows=1 loops=1)
Output: avg(number)
Buffers: shared read=541 written=21
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Output: '2015-12-11'::date
-> Seq Scan on public.somedata (cost=0.00..1791.00 rows=273 width=8) (actual time=0.026..14.589 rows=270 loops=1)
Output: d, number
Filter: (somedata.d = $0)
Buffers: shared read=541 written=21
Total runtime: 14.707 ms
Run Code Online (Sandbox Code Playgroud)
通过将其放入子查询中,您可以将函数调用从 WHERE 子句移至 SELECT 子句。更重要的是,子查询总是可以执行一次并由查询的其余部分使用;因此该函数在计划的单独节点中运行一次。
为了确认这一点,我们可以将 SQL 完全从函数中取出:
select avg(number) from somedata where d=(select max(d) from dates where d < '2015-12-12');
Run Code Online (Sandbox Code Playgroud)
这给出了一个相当长的计划,但性能非常相似:
Aggregate (cost=1799.12..1799.13 rows=1 width=8) (actual time=14.174..14.174 rows=1 loops=1)
Output: avg(somedata.number)
Buffers: shared read=543 written=19
InitPlan 1 (returns $0)
-> Aggregate (cost=7.43..7.44 rows=1 width=4) (actual time=0.150..0.150 rows=1 loops=1)
Output: max(dates.d)
Buffers: shared read=2
-> Seq Scan on public.dates (cost=0.00..6.56 rows=347 width=4) (actual time=0.015..0.103 rows=345 loops=1)
Output: dates.d
Filter: (dates.d < '2015-12-12'::date)
Buffers: shared read=2
-> Seq Scan on public.somedata (cost=0.00..1791.00 rows=273 width=8) (actual time=0.190..14.098 rows=270 loops=1)
Output: somedata.d, somedata.number
Filter: (somedata.d = $0)
Buffers: shared read=543 written=19
Total runtime: 14.232 ms
Run Code Online (Sandbox Code Playgroud)
需要注意的重要一点是,内部聚合 (the max(d)) 在与主 Seq Scan(正在检查子句)不同的节点上执行一次where。在这个位置上,即使是一个VOLATILE功能也可以用同样的方式进行优化。
简而言之,虽然您知道您生成的查询可以通过仅执行该函数一次来优化,但它与 Postgres 的查询计划程序知道如何重写的任何模式都不匹配,因此它使用一个简单的计划来运行函数多次。
[注意:所有测试都是在 Postgres 9.1 上进行的,因为这是我碰巧必须要处理的。]