我有一个像pg这样的表:
CREATE TABLE t (
a BIGSERIAL NOT NULL, -- 8 b
b SMALLINT, -- 2 b
c SMALLINT, -- 2 b
d REAL, -- 4 b
e REAL, -- 4 b
f REAL, -- 4 b
g INTEGER, -- 4 b
h REAL, -- 4 b
i REAL, -- 4 b
j SMALLINT, -- 2 b
k INTEGER, -- 4 b
l INTEGER, -- 4 b
m REAL, -- 4 b
CONSTRAINT a_pkey PRIMARY KEY (a)
);
Run Code Online (Sandbox Code Playgroud)
以上每行最多可添加50个字节.我的经验是,我需要另外40%到50%的系统开销,甚至没有任何用户创建的索引.所以,每行约75个字节.我将在表中有许多行,可能超过1450亿行,因此该表将推动13-14太字节.我可以使用什么技巧来压缩这个表?我的可能想法如下......
将 …
在查询中引入ORDER BY子句会增加总时间,因为db必须执行额外的工作才能对结果集进行排序:
我想念的是为什么只从连接表中添加一列产生如此不同的性能.
EXPLAIN ANALYZE
SELECT p.*
FROM product_product p
JOIN django_site d ON (p.site_id = d.id)
WHERE (p.active = true AND p.site_id = 1 )
ORDER BY d.domain, p.ordering, p.name
Run Code Online (Sandbox Code Playgroud)
Sort (cost=3909.83..3952.21 rows=16954 width=1086) (actual time=1120.618..1143.922 rows=16946 loops=1)
Sort Key: django_site.domain, product_product.ordering, product_product.name
Sort Method: quicksort Memory: 25517kB
-> Nested Loop (cost=0.00..2718.86 rows=16954 width=1086) (actual time=0.053..87.396 rows=16946 loops=1)
-> Seq Scan on django_site (cost=0.00..1.01 rows=1 width=24) (actual time=0.010..0.012 rows=1 loops=1)
Filter: (id = 1) …Run Code Online (Sandbox Code Playgroud) 我创建了以下表和索引:
CREATE TABLE cdc_auth_user
(
cdc_auth_user_id bigint NOT NULL DEFAULT nextval('cdc_auth_user_id_seq'::regclass),
cdc_timestamp timestamp without time zone DEFAULT ('now'::text)::timestamp without time zone,
cdc_operation text,
id integer,
username character varying(30)
);
CREATE INDEX idx_cdc_auth_user_cdc_timestamp
ON cdc_auth_user
USING btree (cdc_timestamp);
Run Code Online (Sandbox Code Playgroud)
但是,当我使用timestamp字段执行select时,索引将被忽略,我的查询将花费大约10秒的时间返回:
EXPLAIN SELECT *
FROM cdc_auth_user
WHERE cdc_timestamp BETWEEN '1900/02/24 12:12:34.818'
AND '2012/02/24 12:17:45.963';
Seq Scan on cdc_auth_user (cost=0.00..1089.05 rows=30003 width=126)
Filter: ((cdc_timestamp >= '1900-02-24 12:12:34.818'::timestamp without time zone) AND (cdc_timestamp <= '2012-02-24 12:17:45.963'::timestamp without time zone))
Run Code Online (Sandbox Code Playgroud) postgresql ×3
sql ×2
bigdata ×1
collation ×1
indexing ×1
performance ×1
sql-order-by ×1
storage ×1