Kyl*_*yle 5 postgresql performance datatypes postgresql-9.4 postgresql-performance
我正在对 PostgreSQL 9.4rc1 上的新数据库设计进行一些性能测试,我看到一些使用窗口函数的查询速度很慢。这是我的表设置:
CREATE TABLE player_stat (
player_id VARCHAR(200) NOT NULL,
stat_id BIGINT NOT NULL,
value BIGINT NOT NULL DEFAULT 0,
last_updated TIMESTAMP WITH TIME ZONE NOT NULL,
last_active TIMESTAMP WITH TIME ZONE DEFAULT NULL,
CONSTRAINT player_stat_pk PRIMARY KEY (player_id, stat_id),
CONSTRAINT player_stat_fk1 FOREIGN KEY(stat_id) REFERENCES stat (id)
);
CREATE INDEX player_stat_stat_value_player_desc
ON player_stat (stat_id, value DESC, player_id ASC);
Run Code Online (Sandbox Code Playgroud)
我在这个表中插入了 3000 万行,分为 3 个统计数据:
INSERT INTO player_stat (player_id, stat_id, value, last_updated) SELECT x.id, 1, x.v, now() FROM (SELECT generate_series(1,10000000) as id, trunc(random() * (1900-1200) + 1200) as v) AS x;
INSERT INTO player_stat (player_id, stat_id, value, last_updated) SELECT x.id, 2, x.v, now() FROM (SELECT generate_series(1,10000000) as id, trunc(random() * (1900-1200) + 1200) as v) AS x;
INSERT INTO player_stat (player_id, stat_id, value, last_updated) SELECT x.id, 3, x.v, now() FROM (SELECT generate_series(1,10000000) as id, trunc(random() * (1900-1200) + 1200) as v) AS x;
Run Code Online (Sandbox Code Playgroud)
然后我尝试根据给定的统计数据(EDIT)对球员进行排名:
SELECT * FROM
( SELECT player_id
, rank() OVER (ORDER BY value DESC, player_id ASC) as rank
FROM player_stat
WHERE stat_id = 1
) as t
WHERE rank <= 20
ORDER BY rank ASC;
Run Code Online (Sandbox Code Playgroud)
此查询大约需要 5.5 秒才能返回。在其上运行解释会返回以下内容:
"Sort (cost=1167612.28..1176082.26 rows=3387993 width=15) (actual time=9726.132..9726.135 rows=20 loops=1)"
" Sort Key: t.rank"
" Sort Method: quicksort Memory: 25kB"
" -> Subquery Scan on t (cost=0.56..684349.57 rows=3387993 width=15) (actual time=0.080..9726.116 rows=20 loops=1)"
" Filter: (t.rank <= 20)"
" Rows Removed by Filter: 9999980"
" -> WindowAgg (cost=0.56..557299.83 rows=10163979 width=15) (actual time=0.077..8351.124 rows=10000000 loops=1)"
" -> Index Only Scan using player_stat_stat_value_player_desc on player_stat (cost=0.56..379430.20 rows=10163979 width=15) (actual time=0.054..2319.007 rows=10000000 loops=1)"
" Index Cond: (stat_id = 1)"
" Heap Fetches: 0"
"Planning time: 0.187 ms"
"Execution time: 9726.172 ms"
Run Code Online (Sandbox Code Playgroud)
有什么办法可以加快速度吗?花费的时间似乎随着表中玩家的数量呈线性增长。
有什么办法可以加快速度吗?
是的。不要varchar
为integer
数字使用列。使用integer
或者bigint
如果您刻录那么多 ID - 表和索引中要小得多,处理速度也更快。由于您在测试中对1000 万行进行排名,这将产生重大影响。
player_id VARCHAR(200) NOT NULL,
player_id int NOT NULL,
或者uuid
如果你必须(我怀疑):
您的查询对 1000 万行进行排名。这将需要一些时间,即使直接从索引读取并且没有排序步骤也是如此。
附注:如果您将大量插入行第一,并添加索引和PK约束(和FK约束)后,这将是多快,再加上你不膨胀完美指标,但不运行REINDEX
或VACUUM FULL
。ANALYZE
不过,在测试性能之前,请确保已在桌面上运行。
..但是,在这里出去玩,可能正在寻找什么。
该EXPLAIN
输出表明您筛选前20行:(t.rank <= 20)
。您提出的查询没有显示。实际匹配您的EXPLAIN
输出的查询将是:
SELECT * FROM (
SELECT player_id
, rank() OVER (ORDER BY value DESC, player_id ASC) AS rank
FROM player_stat
WHERE stat_id = 1
) t
WHERE t.rank <= 20;
Run Code Online (Sandbox Code Playgroud)
可以显着改进:
SELECT row_number() OVER (ORDER BY value DESC, player_id ASC) AS rank
, player_id
FROM player_stat
WHERE stat_id = 1
ORDER BY value DESC, player_id
LIMIT 20;
Run Code Online (Sandbox Code Playgroud)
性能的重要部分是LIMIT
与ORDER BY
匹配索引相结合的子句:现在查询从顶部到索引正好读取 20 行,在原始版本中它必须读取 10000000。我们只使用player_id
and value
,所以我们仍然可以进行仅索引扫描。剩下的就是花生了。
这完全是由于查询中的事件序列SELECT
:窗口函数在 LIMIT
. 仅当排序顺序一致时,我们才不必考虑其余适用的 10000000 行。
我们可以使用,LIMIT 20
因为前 20 个排名保证跨越不超过 20 行。PK on(player_id, stat_id)
保证player_id
每个唯一,stat_id
并且因为它包含在 中ORDER BY
,所以每个等级只分配一次 - 这也意味着我们可以使用稍微便宜的row_number()
代替。
归档时间: |
|
查看次数: |
1789 次 |
最近记录: |