查询优化或缺少索引？

Question

查询优化或缺少索引？

我有这个查询将一些数据从t2into聚合t1。这样做是为了优化我正在处理的应用程序，以便减少对数据库的查询。我选择了下面的方法来确保我不必更新t1两次。

最大的问题是，我可能在这里遗漏了哪些索引，查询可以进一步优化吗？

update t1
set
  col1 = t2.col1_count,
  col2 = t2.col2_sum,
  col3 = t2.col3_sum
from  (
  select
    b.user_id, b.t1_id,
    coalesce(count(b.id), 0) as col1_count,
    sum(case when b.col5 = true then b.col2 else 0 end) as col2_sum,
    sum(case when b.col5 = false then b.col3 else 0 end) as col3_sum
  from t1 a 
    left join t2 b on b.t1_id = a.id
  where
    b.user_id = 1
  group by b.user_id, b.t1_id
) as t2
where 
  t2.t1_id = t1.id;

Run Code Online (Sandbox Code Playgroud)

编辑添加请求的信息

这些是我当前的索引：

create index ix_t1_user_id on t1(user_id);
create unique index ux_t2_t1_id_t3_id on t2(t1_id, t3_id);
create index ix_t2_user_id on t2(user_id);
create index ix_t2_t1_id on t2(t1_id);

Run Code Online (Sandbox Code Playgroud)

解释分析给了我以下结果：

Update on t1  (cost=2725.40..2737.42 rows=1 width=138) (actual time=1.428..1.428 rows=0 loops=1)
  ->  Nested Loop  (cost=2725.40..2737.42 rows=1 width=138) (actual time=0.646..1.148 rows=166 loops=1)
        ->  Subquery Scan on t2  (cost=2725.40..2725.42 rows=1 width=84) (actual time=0.642..0.729 rows=166 loops=1)
              ->  HashAggregate  (cost=2725.40..2725.41 rows=1 width=17) (actual time=0.639..0.685 rows=166 loops=1)
                    ->  Nested Loop  (cost=5.81..2725.39 rows=1 width=17) (actual time=0.034..0.536 rows=197 loops=1)
                          ->  Bitmap Heap Scan on t2 b  (cost=5.81..414.29 rows=193 width=13) (actual time=0.024..0.050 rows=197 loops=1)
                                Recheck Cond: (user_id = 1)
                                ->  Bitmap Index Scan on ix_t2_user_id  (cost=0.00..5.76 rows=193 width=0) (actual time=0.017..0.017 rows=197 loops=1)
                                      Index Cond: (user_id = 1)
                          ->  Index Scan using t1_pkey on t1 a  (cost=0.00..11.96 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=197)
                                Index Cond: (id = b.t1_id)
                                Filter: (user_id = 1)
        ->  Index Scan using t1_pkey on t1  (cost=0.00..11.98 rows=1 width=58) (actual time=0.002..0.002 rows=1 loops=166)
              Index Cond: (id = t2.t1_id)
Total runtime: 1.490 ms

Run Code Online (Sandbox Code Playgroud)

Answer 1

Erw*_*ter 5

简化查询

user_id从子查询中删除无用。
删除coalesce周围count()。我引用了聚合函数的手册：

需要注意的是，除了 count 之外，这些函数在没有选择任何行时都返回一个空值。

意思是，count()永远不会返回NULL。
LEFT JOIN从子查询中删除冗余（更新：如果要将列设置为 0，其中在中找不到行，则不这样做t2）。

UPDATE t1
SET    col1 = t2.col1_count
      ,col2 = t2.col2_sum
      ,col3 = t2.col3_sum
FROM  (
   SELECT t1_id
         ,count(*) AS col1_count  -- if id is NOT NULL, count(*) is a bit faster
         ,sum(CASE WHEN col5 = true  THEN col2 ELSE 0 END) AS col2_sum -- might be simpler
         ,sum(CASE WHEN col5 = false THEN col3 ELSE 0 END) AS col3_sum -- missing info
   FROM   t2
   WHERE  user_id = 1
   GROUP  BY t1_id
   ) t2
WHERE  t2.t1_id = t1.id;

Run Code Online (Sandbox Code Playgroud)

要在t1没有任何匹配项的情况下重置行t2：

UPDATE t1
SET    col1 = 0, col2 = 0, col3 = 0
WHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t2.t1_id = t1.id);

Run Code Online (Sandbox Code Playgroud)

要同时执行这两项操作，LEFT JOIN像您这样使用子查询的版本可能会更快，这取决于您的数据分布。

避免空更新

如果 int1中的值有可能是最新的，请向WHERE子句添加条件以防止空更新（适用于两个查询）：

...
AND (col1 IS DISTINCT FROM t2.col1_count OR -- again: might be simpler
     col2 IS DISTINCT FROM t2.col2_sum   OR -- missing info
     col3 IS DISTINCT FROM t2.col3_sum)

Run Code Online (Sandbox Code Playgroud)

对于定义的列，NOT NULL您可以使用<>代替IS DISTINCT FROM.

这可以产生很大的不同，更新是昂贵的。

指数

为此您需要的唯一索引（除了上的主键t1.id）是：

CREATE INDEX ix_t2_user_id ON t2(user_id);

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，9 月前
查看次数：	1528 次
最近记录：	12 年，9 月前