聚合过滤器表达式不能使用索引吗？

Question

聚合过滤器表达式不能使用索引吗？

关于过滤器表达式的一个很酷的事情是,您可以在一个查询中执行多个不同的过滤器和聚合."where"部分成为聚合的一部分而不是整个"where"子句.

例如:

SELECT count('id') FILTER (WHERE account_type=1) as regular,
       count('id') FILTER (WHERE account_type=2) as gold,
       count('id') FILTER (WHERE account_type=3) as platinum
FROM clients;

Run Code Online (Sandbox Code Playgroud)

(来自Django文档)

这是PostgreSQL 9.5中的一个错误,或者我做得不对,或者它只是PostgreSQL的一个限制.

考虑这两个查询:

select count(*)
from main_search
where created >= '2017-10-12T00:00:00.081739+00:00'::timestamptz
and created < '2017-10-13T00:00:00.081739+00:00'::timestamptz
and parent_id is null;

select
count('id') filter (
where created >= '2017-10-12T00:00:00.081739+00:00'::timestamptz
and created < '2017-10-13T00:00:00.081739+00:00'::timestamptz
and parent_id is null) as count
from main_search;

Run Code Online (Sandbox Code Playgroud)

(该main_search表有一个组合的btree索引created and parent_id is null)

这是输出:

 count
-------
  9682
(1 row)

 count
-------
  9682
(1 row)

Run Code Online (Sandbox Code Playgroud)

如果我坚持explain analyze在每个查询前面,这是输出:

    QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1174.04..1174.05 rows=1 width=0) (actual time=5.077..5.077 rows=1 loops=1)
   ->  Index Scan using main_search_created_parent_id_null_idx on main_search  (cost=0.43..1152.69 rows=8540 width=0) (actual time=0.026..4.384 rows=9682 loops=1)
         Index Cond: ((created >= '2017-10-11 20:00:00.081739-04'::timestamp with time zone) AND (created < '2017-10-12 20:00:00.081739-04'::timestamp with time zone))
 Planning time: 0.826 ms
 Execution time: 5.227 ms
(5 rows)

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=178054.93..178054.94 rows=1 width=12) (actual time=1589.006..1589.007 rows=1 loops=1)
   ->  Seq Scan on main_search  (cost=0.00..146459.39 rows=4212739 width=12) (actual time=0.051..882.099 rows=4212818 loops=1)
 Planning time: 0.051 ms
 Execution time: 1589.070 ms
(4 rows)

Run Code Online (Sandbox Code Playgroud)

注意!过滤器表达式SELECT语句始终使用秒扫描而不是索引扫描:<

我在另一个数据库中使用另一个PostgreSQL 9.5表也尝试过这个.起初我认为"Seq Scan"的发生是因为该表的行数太少但两个表都足够大以至于索引应该启动.

Answer 1

Liv*_*ius 1

您误解了用例。过滤器仅影响PRODUCED ALREADY DATASET上的聚合。它不过滤记录。

考虑修改后的示例：

SELECT count(*) FILTER (WHERE account_type=1) as regular,
       count(*) FILTER (WHERE account_type=2) as gold,
       count(*) FILTER (WHERE account_type=3) as platinum,
       count(*) 
FROM clients;

Run Code Online (Sandbox Code Playgroud)

那么where子句应该如何呢？

WHERE
(WHERE account_type=3)
or
(WHERE account_type=2)
or
(WHERE account_type=1)
or 1=1 ???

Run Code Online (Sandbox Code Playgroud)

考虑更复杂的过滤器和未过滤列的组合。这对于优化器来说将是一场噩梦。

当您考虑 FILTER 时，请考虑这只是较长句子（例如 CASE）的捷径

SELECT SUM(CASE WHEN account_type=1 THEN 1 ELSE 0 END) as regular,
       SUM(CASE WHEN account_type=2 THEN 1 ELSE 0 END) as gold,
       SUM(CASE WHEN account_type=3 THEN 1 ELSE 0 END) as platinum
FROM clients;

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，3 月前
查看次数：	235 次
最近记录：	7 年，10 月前