我有一个 PostgreSQL 9.3 表,其中包含一些数字和一些附加数据:
CREATE TABLE mytable (
myid BIGINT,
somedata BYTEA
)
Run Code Online (Sandbox Code Playgroud)
该表目前有大约 10M 条记录,占用 1GB 磁盘空间。myid
不连续。
我想计算 100000 个连续数字的每个块中有多少行:
SELECT myid/100000 AS block, count(*) AS total FROM mytable GROUP BY myid/100000;
Run Code Online (Sandbox Code Playgroud)
这将返回大约 3500 行。
我注意到某个索引的存在显着加快了这个查询,即使查询计划根本没有提到它。没有索引的查询计划:
db=> EXPLAIN (ANALYZE TRUE, VERBOSE TRUE) SELECT myid/100000 AS block, count(*) AS total FROM mytable GROUP BY myid/100000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=1636639.92..1709958.65 rows=496942 width=8) (actual time=6783.763..8888.841 rows=3460 loops=1)
Output: ((myid / 100000)), count(*)
-> Sort (cost=1636639.92..1659008.91 rows=8947594 width=8) (actual time=6783.752..8005.831 …
Run Code Online (Sandbox Code Playgroud) 我试图在 Postgres 9.4 中加速这个查询:
SELECT "groupingsFrameHash", COUNT(*) AS nb
FROM "public"."zrac_c1e350bb-a7fc-4f6b-9f49-92dfd1873876"
GROUP BY "groupingsFrameHash"
HAVING COUNT(*) > 1
ORDER BY nb DESC LIMIT 10
Run Code Online (Sandbox Code Playgroud)
我在 上有一个索引"groupingsFrameHash"
。我不需要精确的结果,模糊近似就足够了。
这是查询计划:
Limit (cost=17207.03..17207.05 rows=10 width=25) (actual time=740.056..740.058 rows=10 loops=1)
-> Sort (cost=17207.03..17318.19 rows=44463 width=25) (actual time=740.054..740.055 rows=10 loops=1)
Sort Key: (count(*))
Sort Method: top-N heapsort Memory: 25kB
-> GroupAggregate (cost=14725.95..16246.20 rows=44463 width=25) (actual time=615.109..734.740 rows=25977 loops=1)
Group Key: "groupingsFrameHash"
Filter: (count(*) > 1)
Rows Removed by Filter: 24259
-> Sort (cost=14725.95..14967.07 rows=96446 …
Run Code Online (Sandbox Code Playgroud)