标签: postgresql-performance

在 PostgreSQL 中使用 GIN 索引时如何加快 ORDER BY 排序？

我有一张这样的表：

CREATE TABLE products (
  id serial PRIMARY KEY, 
  category_ids integer[],
  published boolean NOT NULL,
  score integer NOT NULL,
  title varchar NOT NULL);

Run Code Online (Sandbox Code Playgroud)

一个产品可以属于多个类别。category_ids列包含所有产品类别的 id 列表。

典型的查询看起来像这样（总是搜索单个类别）：

SELECT * FROM products WHERE published
  AND category_ids @> ARRAY[23465]
ORDER BY score DESC, title
LIMIT 20 OFFSET 8000;

Run Code Online (Sandbox Code Playgroud)

为了加快速度，我使用以下索引：

CREATE INDEX idx_test1 ON products
  USING GIN (category_ids gin__int_ops) WHERE published;

Run Code Online (Sandbox Code Playgroud)

除非某一类别中的产品太多，否则这会很有帮助。它会快速过滤掉属于该类别的产品，但随后必须以艰难的方式完成排序操作（没有索引）。

已安装的btree_gin扩展允许我像这样构建多列 GIN 索引：

CREATE INDEX idx_test2 ON products USING GIN (
  category_ids gin__int_ops, score, title) WHERE published; …

Run Code Online (Sandbox Code Playgroud)

postgresql performance index postgresql-performance

Yar*_*hiy

2020 01-08

14
推荐指数

2
解决办法

1万
查看次数

Why does this LEFT JOIN perform so much worse than LEFT JOIN LATERAL?

I have the following tables (taken from the Sakila database):

film: film_id is pkey
actor: actor_id is pkey
film_actor: film_id and actor_id are fkeys to film/actor

I am selecting a particular film. For this film, I also want all actors participating in that film. I have two queries for this: one with a LEFT JOIN and one with a LEFT JOIN LATERAL.

select film.film_id, film.title, a.actors
from   film
left join
  (         
       select     film_actor.film_id, array_agg(first_name) as actors
       from       actor
       inner …

Run Code Online (Sandbox Code Playgroud)

postgresql performance join execution-plan postgresql-10 postgresql-performance

Jel*_*rns

2020 01-08

14
推荐指数

1
解决办法

4050
查看次数

大表中完全空的列如何影响性能？

我在 Postgres 数据库中有 4 亿行，表有 18 列：

id serial NOT NULL,
a integer,
b integer,
c integer,
d smallint,
e timestamp without time zone,
f smallint,
g timestamp without time zone,
h integer,
i timestamp without time zone,
j integer,
k character varying(32),
l integer,
m smallint,
n smallint,
o character varying(36),
p character varying(100),
q character varying(100)

Run Code Online (Sandbox Code Playgroud)

列e、k和n都是 NULL，它们根本不存储任何值，此时完全没用。它们是原始设计的一部分，但从未被移除。

编辑 - 大多数其他列都是非 NULL。

问题：

如何计算这对存储的影响？它是否等于列的大小 * 行数？
删除这些空列会显着提高该表的性能吗？页面缓存能够容纳更多行吗？

postgresql performance database-design storage disk-space postgresql-performance

ebi*_*ebi

2020 01-08

13
推荐指数

1
解决办法

4824
查看次数

postgres 在 ORDER BY "id" DESC LIMIT 1 上表现不佳

我有items以下架构的表（在 postgres v9.3.5 中）：

  Column   | Type   |                         Modifiers                  | Storage  
-----------+--------+----------------------------------------------------+----------
 id        | bigint | not null default nextval('items_id_seq'::regclass) | plain    
 data      | text   | not null                                           | extended 
 object_id | bigint | not null                                           | plain    
Indexes:
    "items_pkey" PRIMARY KEY, btree (id)
    "items_object_id_idx" btree (object_id)
Has OIDs: no

Run Code Online (Sandbox Code Playgroud)

当我执行查询时，它会挂起很长时间：

SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;

Run Code Online (Sandbox Code Playgroud)

在 VACUUM ANALYZE 之后，查询执行改进了很多，但仍然不完美。

# EXPLAIN ANALYZE SELECT * FROM "items" WHERE "object_id" …

Run Code Online (Sandbox Code Playgroud)

performance postgresql-9.3 slow-log postgresql-performance

hap*_*set

2020 01-08

13
推荐指数

2
解决办法

2万
查看次数

在多个文本字段上使用模式匹配进行更快的查询

我有一个包含超过 20M 元组的 Postgres 表：

first_name | last_name | email
-------------------------------------------
bat        | man       | batman@wayne.com
arya       | vidal     | foo@email.com
max        | joe       | bar@email.com

Run Code Online (Sandbox Code Playgroud)

要过滤我正在使用的记录：

SELECT *
  FROM people
WHERE (first_name || '' || last_name) ILIKE '%bat%man%' OR 
    first_name ILIKE '%bat%man%'  OR  
    last_name ILIKE '%bat%man%'   OR
    email ILIKE '%bat%man%'
    LIMIT 25 OFFSET 0

Run Code Online (Sandbox Code Playgroud)

即使使用索引，搜索也需要将近一分钟才能返回结果。
有索引的(first_name || '' || last_name)，first_name，last_name和email。

我可以做些什么来提高此查询的性能？

postgresql performance index full-text-search postgresql-performance

Vic*_*tor

2020 01-08

12
推荐指数

1
解决办法

1万
查看次数

PostgreSQL 可以在每个查询中使用多个部分索引吗？

我已经读到 PostgreSQL 通常可以使用多个索引，但是在跨两个索引的查询的特定情况下，它会同时使用两个索引吗？如果是这样，它们是按顺序加载还是一起加载？

例如，如果此查询跨越两个部分索引 by column_1，将如何使用部分索引，以及如何加载和丢弃索引数据：

SELECT 1 FROM sample_table WHERE column_1 > 50 AND column_2 < 50000

Run Code Online (Sandbox Code Playgroud)

postgresql performance index-tuning postgresql-performance

Jim*_*Bob

2020 01-08

11
推荐指数

1
解决办法

7694
查看次数

如何使用索引加快 postgres 中的排序

我正在使用 postgres 9.4。

的messages具有以下模式：消息属于FEED_ID，并且具有posted_at，还消息可以具有（在答复的情况）的父消息。

                    Table "public.messages"
            Column            |            Type             | Modifiers
------------------------------+-----------------------------+-----------
 message_id                   | character varying(255)      | not null
 feed_id                      | integer                     |
 parent_id                    | character varying(255)      |
 posted_at                    | timestamp without time zone |
 share_count                  | integer                     |
Indexes:
    "messages_pkey" PRIMARY KEY, btree (message_id)
    "index_messages_on_feed_id_posted_at" btree (feed_id, posted_at DESC NULLS LAST)

Run Code Online (Sandbox Code Playgroud)

我想返回由排序的所有消息share_count，但对于每个parent_id，我只想返回一条消息。即，如果多条消息具有相同的parent_id，则仅posted_at返回最新的一条 ( )。在parent_id可以为空，以空消息parent_id都应该回报。

我使用的查询是：

WITH filtered_messages AS (SELECT * 
                           FROM messages
                           WHERE feed_id …

Run Code Online (Sandbox Code Playgroud)

postgresql performance index sorting postgresql-9.4 postgresql-performance

Zha*_*eng

2020 01-08

10
推荐指数

1
解决办法

2万
查看次数

pgAdmin 在任何远程操作上都非常慢

我从远程连接到我们的开发服务器的本地 pgAdmin 运行此查询：

select * from users order by random() limit 1;

它挂起17秒并显示

Total query runtime: 148 ms. 
1 row retrieved.

Run Code Online (Sandbox Code Playgroud)

它也挂在任何操作上：甚至右键单击表格。

之后我通过 RDP 连接并在相同的 pgAdmin 版本中运行相同的查询，该版本立即显示结果query time: 32 ms。

然后我再次从本地 pgAdmin 运行查询：

Total query runtime: 337 ms.
1 row retrieved.

Run Code Online (Sandbox Code Playgroud)

我对服务器的 ping 时间为 130 毫秒。连接速度应该足够了，因为我可以非常快地通过 FTP 上传文件。

使用本地 psql 运行时的相同查询在几秒钟内完成，包括连接时间。

我的本地 pgAdmin 中与我的本地数据库副本相同的查询也立即完成。

pgAdmin 版本是 1.20.0。也检查了最新的 1.22 - 仍然相同。

我可以做些什么来加速 pgAdmin？

请注意 psql 工作正常，我在那里没有看到相同的延迟。

17 秒查询运行的 pgAdmin 日志：

2016-02-06 16:18:03 INFO   : queueing : select …

Run Code Online (Sandbox Code Playgroud)

postgresql performance pgadmin postgresql-9.4 postgresql-performance

Vla*_*lad

2020 01-08

10
推荐指数

1
解决办法

8195
查看次数

CLUSTER 对性能的影响

我正在尝试优化我的 Postgres 9.2 数据库以加快具有日期限制的查询。

我有一个timestamp专栏，但主要是我要求某一天，所以我创建了一个timestamp用于date解析的索引：

CREATE INDEX foo_my_timestamp_idx
ON foo
USING btree
((my_timestamp::date) DESC);

Run Code Online (Sandbox Code Playgroud)

现在，为了提高性能，我CLUSTER foo使用上面的索引表：

CLUSTER foo USING foo_my_timestamp_idx;

Run Code Online (Sandbox Code Playgroud)

根据手册上SQL-CLUSTER，表

根据索引信息进行物理重新排序

我想知道是否会对使用表 PK 的其他查询的性能产生影响（比如说id_foo）。有什么缺点吗？

postgresql performance storage index-tuning postgresql-9.2 postgresql-performance

cou*_*cat

2020 01-08

10
推荐指数

1
解决办法

3483
查看次数

我们什么时候应该在 Postgresql 中使用表分区？

背景：我们的数据库中有两个相当大的表，一个包含 8000 万条记录，另一个包含 1.6 亿条记录。我们看到了性能问题，并正在考虑对这两个表使用表分区。

我的问题是：是否有很多记录表明我们应该分区或不分区以保持良好的性能？我知道没有“一刀切”的答案，但可能有一个一般性建议，例如“传递了 X 百万条记录，您应该对表进行分区”。有很多关于如何分区的指导，但没有关于“何时”的指导。

postgresql partitioning postgresql-performance

Gra*_*ick

lucky-day

10
推荐指数

1
解决办法

7626
查看次数

标签统计

postgresql-performance ×10

performance ×9

postgresql ×9

index ×3

index-tuning ×2

postgresql-9.4 ×2

storage ×2

database-design ×1

disk-space ×1

execution-plan ×1

full-text-search ×1

join ×1

partitioning ×1

pgadmin ×1

postgresql-10 ×1

postgresql-9.2 ×1

postgresql-9.3 ×1

slow-log ×1

sorting ×1

标签 统计

标签统计