相关疑难解决方法(0)

复合索引是否也适用于第一个字段的查询？

假设我有一个包含字段A和的表B。我在A+上进行常规查询B，所以我在上创建了一个复合索引(A,B)。A复合索引是否也会对查询进行全面优化？

此外，我在上创建了一个索引A，但 Postgres 仍然只使用复合索引来查询A。如果前面的答案是肯定的，我想这并不重要，但是为什么它默认选择复合索引，如果单个A索引可用？

postgresql performance index database-design index-tuning

104
推荐指数

1
解决办法

4万
查看次数

大表中的慢索引扫描

2020-08-04 更新：

由于显然仍在定期查看此答案，因此我想提供有关情况的最新信息。我们目前正在使用带有表分区的 PG 11，timestamp并且可以轻松处理表中的数十亿行。仅索引扫描可以挽救生命，没有它就不可能。

使用 PostgreSQL 9.2，我在相对较大的表（200 多万行）上进行慢速查询时遇到问题。我没有尝试任何疯狂的事情，只是增加了历史价值。下面是查询和查询计划输出。

我的表布局：

                                   Table "public.energy_energyentry"
  Column   |           Type           |                            Modifiers
-----------+--------------------------+-----------------------------------------------------------------
 id        | integer                  | not null default nextval('energy_energyentry_id_seq'::regclass)
 prop_id   | integer                  | not null
 timestamp | timestamp with time zone | not null
 value     | double precision         | not null
Indexes:
    "energy_energyentry_pkey" PRIMARY KEY, btree (id)
    "energy_energyentry_prop_id" btree (prop_id)
    "energy_energyentry_prop_id_timestamp_idx" btree (prop_id, "timestamp")
Foreign-key constraints:
    "energy_energyentry_prop_id_fkey" FOREIGN KEY (prop_id) REFERENCES gateway_peripheralproperty(id) DEFERRABLE INITIALLY DEFERRED

Run Code Online (Sandbox Code Playgroud)

数据范围从2012-01-01至今，新数据不断增加。prop_id外键中大约有 2.2k 个不同的值，均匀分布。

我注意到行估计值相差不远，但成本估计值似乎大了 …

postgresql performance index optimization postgresql-performance

21
推荐指数

2
解决办法

2万
查看次数

[FROM x, y] 在 Postgres 中是什么意思？

我刚刚开始使用 Postgres。阅读此文档时，我遇到了以下查询：

SELECT title, ts_rank_cd(textsearch, query) AS rank
FROM apod, to_tsquery('neutrino|(dark & matter)') query
WHERE query @@ textsearch
ORDER BY rank DESC
LIMIT 10;

Run Code Online (Sandbox Code Playgroud)

我可以理解这个查询中的所有内容，除了这个：FROM apod, ...。

这,是什么意思？我习惯于连接但不习惯于用FROM逗号分隔的多个语句。

我在网上搜索无果。在查看并思考之后，在我看来，它声明了一个名为 query 的变量，因此它可以多次使用它。但如果这是真的，这与什么有关系FROM？

postgresql join

14
推荐指数

2
解决办法

3111
查看次数

如何使用索引加快 postgres 中的排序

我正在使用 postgres 9.4。

的messages具有以下模式：消息属于FEED_ID，并且具有posted_at，还消息可以具有（在答复的情况）的父消息。

                    Table "public.messages"
            Column            |            Type             | Modifiers
------------------------------+-----------------------------+-----------
 message_id                   | character varying(255)      | not null
 feed_id                      | integer                     |
 parent_id                    | character varying(255)      |
 posted_at                    | timestamp without time zone |
 share_count                  | integer                     |
Indexes:
    "messages_pkey" PRIMARY KEY, btree (message_id)
    "index_messages_on_feed_id_posted_at" btree (feed_id, posted_at DESC NULLS LAST)

Run Code Online (Sandbox Code Playgroud)

我想返回由排序的所有消息share_count，但对于每个parent_id，我只想返回一条消息。即，如果多条消息具有相同的parent_id，则仅posted_at返回最新的一条 ( )。在parent_id可以为空，以空消息parent_id都应该回报。

我使用的查询是：

WITH filtered_messages AS (SELECT * 
                           FROM messages
                           WHERE feed_id …

Run Code Online (Sandbox Code Playgroud)

postgresql performance index sorting postgresql-9.4 postgresql-performance

10
推荐指数

1
解决办法

2万
查看次数

临时表上的索引使用情况

我有两个相当简单的查询。第一个查询

 UPDATE mp_physical SET periodic_number = '' WHERE periodic_number is NULL;

Run Code Online (Sandbox Code Playgroud)

这是计划

 duration: 0.125 ms  plan:
    Query Text: UPDATE mp_physical  SET periodic_number = '' WHERE periodic_number is NULL;
    Update on mp_physical  (cost=0.42..7.34 rows=1 width=801)
      ->  Index Scan using "_I_periodic_number" on mp_physical  (cost=0.42..7.34 rows=1 width=801)
            Index Cond: (periodic_number IS NULL)

Run Code Online (Sandbox Code Playgroud)

第二个：

 UPDATE observations_optical_temp SET designation = '' WHERE periodic_number is NULL;

Run Code Online (Sandbox Code Playgroud)

它的计划是：

duration: 2817.375 ms  plan:
    Query Text: UPDATE observations_optical_temp SET periodic_number = '' WHERE periodic_number is NULL;
    Update on observations_optical_temp …

Run Code Online (Sandbox Code Playgroud)

postgresql performance index execution-plan temporary-tables postgresql-performance

8
推荐指数

1
解决办法

1万
查看次数

为什么这种隐式连接的规划方式与显式连接不同？

在这个答案中，我解释了 SQL-89 的隐式语法。
但是我在玩的时候注意到不同的查询计划：

EXPLAIN ANALYZE
  SELECT *
  FROM (values(1)) AS t(x), (values(2)) AS g(y);

                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1)
 Planning time: 0.052 ms
 Execution time: 0.020 ms
(3 rows)

Run Code Online (Sandbox Code Playgroud)

与此相反：

EXPLAIN ANALYZE
  SELECT *
  FROM (values(1)) AS t(x)                      
  CROSS JOIN (values(2)) AS g(y);
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Subquery Scan on g  (cost=0.00..0.02 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1)
 Planning time: 0.075 ms
 Execution time: 0.027 …

Run Code Online (Sandbox Code Playgroud)

postgresql performance join execution-plan postgresql-9.5 postgresql-performance

5
推荐指数

1
解决办法

1270
查看次数

具有主键和外键的查询是否比仅具有主键的查询运行得更快？

SELECT something FROM table WHERE primary_key = ?

Run Code Online (Sandbox Code Playgroud)

对比

SELECT something FROM table WHERE primary_key = ? AND other_key = ?

Run Code Online (Sandbox Code Playgroud)

假设这是一个包含other_key不会改变结果集的场景。在实践中第二个查询更快吗？或者，如果提供了多个，数据库是否只使用一个最佳密钥？

postgresql performance optimization query-performance

5
推荐指数

1
解决办法

3053
查看次数

向高流量的大型 PostgreSQL 表添加主键

我需要向一个高流量的大型 PostgreSQL 表（大约 2TB）添加主键。这是一项关键操作，我正在寻找如何有效地完成该操作的指导。

我已经尝试过以下步骤：

-- Step 1: Add id identity column 
ALTER TABLE users
ADD COLUMN id BIGINT GENERATED ALWAYS as IDENTITY;

-- Step 2: Add unique index on (id, user_id) concurrently
CREATE UNIQUE INDEX CONCURRENTLY id_user_id_idx
   ON users (id, user_id);

-- verify that step 2 is completed
-- Step 3: Add primary key
ALTER TABLE users
   ADD CONSTRAINT users_pkey PRIMARY KEY USING INDEX id_user_id_idx;

Run Code Online (Sandbox Code Playgroud)

我面临两个问题：

表完全锁定在“步骤 1”本身上。

我知道这是预料之中的，但如果有任何选择可以避免这种情况，请提出建议。
我收到以下错误，

错误：无法扩展文件“base/16401/90996”：设备上没有剩余空间提示：检查可用磁盘空间。

但600GB我的服务器上还有剩余的存储空间。

由于表将被锁定在“第 1 步”，并且如果没有选项可以避免这种情况，我可以利用停机时间id先添加列，然后运行其他两个脚本。 …

postgresql index primary-key amazon-rds

5
推荐指数

1
解决办法

1271
查看次数

B 树索引中支持最近行查询的最佳排序顺序？

假设我有一个表，其描述如下：

create table my_table (
  id serial, 
  create_date timestamp with time zone default now(),
  data text
);

Run Code Online (Sandbox Code Playgroud)

和这样的查询：

select * from my_table
where create_date >= timestamp with time zone 'yesterday'

Run Code Online (Sandbox Code Playgroud)

理论上哪个索引会更快，为什么？

create index index_a on my_table (create_date);

create index index_b on my_table (create_date DESC);

Run Code Online (Sandbox Code Playgroud)

postgresql index index-tuning postgresql-performance

3
推荐指数

1
解决办法

2899
查看次数

隐式连接与 Postgres 中的显式连接一样有效吗？

通常这样写很方便：

SELECT * 
FROM t1   # ... +many more tables
INNER JOIN t2 ON (t1.id = t2.col)
INNER JOIN t3 ON (t1.id = t3.col)
INNER JOIN t4 ON (t1.id = t4.col)
...

Run Code Online (Sandbox Code Playgroud)

作为带条件的交叉连接：

SELECT * 
FROM t1, t2, t3, t4   # ... +many more tables
WHERE
       t1.id = t2.col
   AND t1.id = t3.col
   AND t1.id = t4.col
   # +include matches on columns of other tables

Run Code Online (Sandbox Code Playgroud)

但是，交叉连接的简单实现将比内部连接具有更高的时间复杂度。Postgres 是否将第二个查询优化为与第一个查询具有相同时间复杂度的查询？

postgresql join execution-plan

2
推荐指数

1
解决办法

2444
查看次数

标签统计

postgresql ×10

performance ×6

postgresql-performance ×5

execution-plan ×3

join ×3

index-tuning ×2

optimization ×2

database-design ×1

postgresql-9.4 ×1

postgresql-9.5 ×1

primary-key ×1

query-performance ×1

temporary-tables ×1