为什么这个SUM()函数在PostgreSQL中需要这么长时间？

Question

为什么这个SUM()函数在PostgreSQL中需要这么长时间？

yeg*_*256 2 sql postgresql postgresql-9.2

这是我的查询:

SELECT SUM(amount) FROM bill WHERE name = 'peter'

Run Code Online (Sandbox Code Playgroud)

表中有800K +行.EXPLAIN ANALYZE说:

Aggregate  (cost=288570.06..288570.07 rows=1 width=4) (actual time=537213.327..537213.328 rows=1 loops=1)
->  Seq Scan on bill  (cost=0.00..288320.94 rows=498251 width=4) (actual time=48385.201..535941.041 rows=800947 loops=1)
Filter: ((name)::text = 'peter'::text)
Rows Removed by Filter: 8
Total runtime: 537213.381 ms

Run Code Online (Sandbox Code Playgroud)

所有行都受到影响,这是正确的.但为什么这么久？类似的查询没有WHERE更快的运行方式:

ANALYZE EXPLAIN SELECT SUM(amount) FROM bill
Aggregate  (cost=137523.31..137523.31 rows=1 width=4) (actual time=2198.663..2198.664 rows=1 loops=1)
->  Index Only Scan using idx_amount on bill  (cost=0.00..137274.17 rows=498268 width=4) (actual time=0.032..1223.512 rows=800955 loops=1)
Heap Fetches: 533399
Total runtime: 2198.717 ms

Run Code Online (Sandbox Code Playgroud)

我有索引amount和索引name.我错过了任何索引吗？

PS.我设法通过添加一个新的idex来解决问题ON bill(name, amount).我不明白为什么它有所帮助,所以让我们暂时搁置这个问题......

Answer 1

kri*_*sku 6

由于您要搜索特定名称,因此您应该有一个名称作为第一列的索引,例如CREATE INDEX IX_bill_name ON bill( name ).

But Postgres can still opt to do a full table scan if it estimates your index to not be specific enough, i.e. if it thinks it is faster to just scan all rows and pick the matching ones instead of consulting an index and start jumping around in the table to gather the matching rows. Postgres uses a cost-based estimation technique that weights random disk reads to be more expensive than sequential reads.

For an index to actually be used in your situation, there should be no more than 10% of the rows matching what you are searching for. Since most of your rows have name=peter it is actually faster to do a full table scan.

As to why the SUM without filtering runs faster has to do with overall width of the table. With a where-clause, postgres has to sequentially read all rows in the table so it can disregard those that do not match the filter. Without a where-clause, postgres can instead read all the amounts from the index. Because the index on amounts contains the amounts and pointers to each corresponding rows, but no other data from the table, it is simply less data to wade through. Based on the big different in performance I guess you have quite a lot of other fields in your table..

归档时间：	12 年，5 月前
查看次数：	4592 次
最近记录：	12 年，5 月前