MySQL 与 PostgreSQL：对 COUNT(*) 执行速度进行基准测试

Question

MySQL 与 PostgreSQL：对 COUNT(*) 执行速度进行基准测试

我对数据库进行基准测试以找出最适合我的项目的数据库，我发现这count(*)在 PostgeSQL 中非常慢。我不明白这是 PostgeSQL 的正常行为还是我做错了什么。

我有一个包含 ~200M 记录的表。MySQL表定义：

CREATE TABLE t1 (
  id int(11) NOT NULL AUTO_INCREMENT,
  t2_id int(11) NOT NULL,
....  
  PRIMARY KEY (id),
  KEY index_t2 (t2_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Run Code Online (Sandbox Code Playgroud)

请求（返回~30M）：

SELECT COUNT(*) FROM t1 WHERE t2_id = 7;

Run Code Online (Sandbox Code Playgroud)

运行：

25,797ms MySQL (v5.7.11)

1,222,168ms PostgeSQL (v9.5)

解释：

MySQL：

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: index_t2
          key: index_t2
      key_len: 4
          ref: const
         rows: 59438630
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

Run Code Online (Sandbox Code Playgroud)

PostgreSQL

Aggregate  (cost=4469365.02..4469365.03 rows=1 width=0)
 ->  Bitmap Heap Scan on t1  (cost=715817.34..4382635.74 rows=34691712 width=0)
       Recheck Cond: (t2_id = 7)
       ->  Bitmap Index Scan on index_t2  (cost=0.00..707144.41 rows=34691712 width=0)
             Index Cond: (t2_id = 7)

Run Code Online (Sandbox Code Playgroud)

服务器：AWS RDS (db.r3.xlarge) vCPU：4 内存：30Gb

更新 (2016-09-20)：

> explain (analyze, buffers) SELECT COUNT(*) FROM t1 WHERE t2_id = 7;

QUERY PLAN                                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4469365.02..4469365.03 rows=1 width=4) (actual time=1213456.539..1213456.539 rows=1 loops=1)
   Buffers: shared read=2734808
   ->  Bitmap Heap Scan on t1  (cost=715817.34..4382635.74 rows=34691712 width=4) (actual time=64015.828..1205542.421 rows=31383566 loops=1)
         Recheck Cond: (t2_id = 7)
         Rows Removed by Index Recheck: 108582028
         Heap Blocks: exact=19929 lossy=2606242
         Buffers: shared read=2734808
         ->  Bitmap Index Scan on index_t2  (cost=0.00..707144.41 rows=34691712 width=0) (actual time=64009.598..64009.598 rows=31383566 loops=1)
               Index Cond: (t2_id = 7)
               Buffers: shared read=108637
 Planning time: 0.080 ms
 Execution time: 1213456.891 ms
(12 rows)

Time: 1213484.579 ms

Run Code Online (Sandbox Code Playgroud)

更新 (2016-09-21)：

> explain (analyze, buffers) SELECT t2_id FROM t1 WHERE t2_id = 7;
                                                                                  QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=715817.34..4382635.74 rows=34691712 width=114) (actual time=59954.834..1234070.436 rows=31383566 loops=1)
   Recheck Cond: (t2_id = 7)
   Rows Removed by Index Recheck: 108582028
   Heap Blocks: exact=19929 lossy=2606242
   Buffers: shared hit=4824 read=2729984
   ->  Bitmap Index Scan on index_t2  (cost=0.00..707144.41 rows=34691712 width=0) (actual time=59948.598..59948.598 rows=31383566 loops=1)
         Index Cond: (t2_id = 7)
         Buffers: shared hit=4824 read=103813
 Planning time: 0.086 ms
 Execution time: 1239826.408 ms
(10 rows)

Time: 1239827.053 ms

Run Code Online (Sandbox Code Playgroud)

Answer 1

3ma*_*uek 5

两种 RDBMS 进行计数的方式不同。在InnoDB中，我们默认有以下行为：

为了处理 SELECT COUNT(*) FROM t 语句，InnoDB 会扫描表的索引，如果索引不完全在缓冲池中，则需要一些时间。

对于 Postgres，您可能想尝试看看仅索引扫描（更接近 InnoDB 行为）是否可以帮助您解决此问题。更多信息请点击这里。由于行数和该值的不良基数（根据统计数据，几乎占表的 15%），我不能保证它会起作用，但您可以尝试：

SELECT COUNT(t2_id) FROM t1 WHERE t2_id = 7;

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年前
查看次数：	2578 次
最近记录：	9 年前