标签: postgresql-performance

如果其他列被索引，Postgres 更新速度会慢吗？

某些更新在大型 Postgres 表上花费的时间太长。鉴于这些条件：

仅更新一列，且未建立索引
由于之前的更新，该列中的每一行都已包含数据
数据的大小没有改变（例如，重写布尔值）
此表或任何其他表中没有其他列依赖于正在更新的列的值
没有对数据库执行其他查询（这是工作站上的个人研究数据库，而不是企业数据库）
其他列上有索引
带 Bitlocker 的旋转驱动器（非 SSD）和带 Windows 8.1 x64 的快速 PC
该表有 1000 万行和 60 列

...您可能会认为，相对于使用 Bitlocker 旋转媒体的预期，更新将花费合理的时间。我们不会创建更多数据，因此不需要在 HDD 上移动现有数据，只需覆盖它即可。其他索引应该不需要更改。等等，相反，经过20个小时不断的硬盘磨练，我厌倦了等待，停止了查询。如果我删除其他列上的所有索引并重新运行查询，则只需要大约 30 分钟。

为什么与此查询无关的列上的索引会使更新时间膨胀？

postgresql performance index update postgresql-performance

Are*_*bre

2020 01-08

4
推荐指数

1
解决办法

3982
查看次数

OR 运算符的索引：a=x 或 b=x

我有一个包含三个整数列id和a的表b。
我想获取所有记录，其中a或b匹配指定参数排序id：

select id, a, b from t where a=x or b=x order by id

Run Code Online (Sandbox Code Playgroud)

请注意，和的x值相同。ab

这里最合适的索引是什么？

更新：我们总是在列和中寻找相同的值，这一事实有什么用处吗？我们可以为此创建一个表达式索引吗？ab

postgresql performance index postgresql-performance

Ale*_*pov

2020 01-08

4
推荐指数

1
解决办法

928
查看次数

如何优化 PostgreSQL 上大型表的最小/最大查询

如何在 PostgreSQL 中对表进行索引，以便最小/最大查询尽快返回？

我有一个包含几亿行的大表。每行都有一个 source_id 和最后更新记录的日期。我想收集每个 source_id 的一些统计信息，特别是每个 source_id 的最小和最大日期范围。

所以我在我的表上创建了这个索引：

 CREATE INDEX CONCURRENTLY mydata_source_last_updated_date ON mydata (source_id, last_updated_date ASC);

Run Code Online (Sandbox Code Playgroud)

但是，当我尝试使用以下命令查询每个源的最短日期时：

SELECT source_id, MIN(last_updated_date) FROM mydata GROUP BY source_id;

Run Code Online (Sandbox Code Playgroud)

查询大约需要一个小时才能完成。

对于这么大的表，即使有索引，这是否是正常的性能？我怎样才能减少这个查询时间？

postgresql performance optimization postgresql-performance

Cer*_*rin

2020 01-08

4
推荐指数

1
解决办法

4003
查看次数

使用 array_position() 函数从 pg_stats 获取列最常见的值

我正在尝试执行这个简单的查询，以检查某个值（1000）是否属于 Postgres 查询优化器使用的 MCV 列表：

SELECT array_position(most_common_vals, 1000) 
FROM pg_stats 
WHERE tablename = 'tenk1' 
AND attname = 'unique1';

Run Code Online (Sandbox Code Playgroud)

但收到以下错误消息：

ERROR:  function array_position(anyarray, integer) does not exist
Run Code Online (Sandbox Code Playgroud)

如何解决？

array_position()是此处描述的标准函数，以下语句按预期返回2：

SELECT array_position('{1,2,3}', 2);

Run Code Online (Sandbox Code Playgroud)

postgresql performance datatypes array postgresql-performance

zer*_*dge

2020 01-08

4
推荐指数

1
解决办法

9412
查看次数

为什么我的查询处于空闲状态？

我是 postgres 新手，我有 aws rds 实例运行 postgresql 引擎版本 11.5。

我所有的查询都是 clientRead 有 wait_event。为什么我的所有查询都处于空闲状态。这是否意味着它们在事务中处于空闲状态？

我应该采取哪些步骤来解决这个问题？

例如，如果我将idle_in_transaction_session_timeout更改为10分钟，它会解决这个问题吗？

select count(*),state FROM pg_stat_activity group by 2;
 count | state
-------+--------
     5 |
     1 | active
   451 | idle


Select pid, datname, usename, wait_event_type, wait_event, backend_type FROM pg_stat_activity where state='idle';
  pid  | datname  |         usename          | wait_event_type | wait_event |  backend_type
-------+----------+--------------------------+-----------------+------------+----------------
 14797 | xxxxx    | user                     | Client          | ClientRead | client backend


SELECT current_setting('idle_in_transaction_session_timeout');
 current_setting
-----------------
 1d
(1 row)

Run Code Online (Sandbox Code Playgroud)

postgresql amazon-rds postgresql-11 postgresql-performance

use*_*691

lucky-day

4
推荐指数

1
解决办法

5402
查看次数

表大这一事实是否会影响 PostgreSQL 服务器的整体性能？

如果我有一个越来越大的表（即它占用越来越多的存储空间 - 目前为 65GB），这是否会影响 PostgreSQL 服务器的整体性能，例如影响对其他表的查询速度？

这是针对 PostgreSQL 9.6 数据库（我们计划在今年晚些时候升级到 10 -> 11 -> 12），托管在 Google Cloud（用于 PostgreSQL 的 Cloud SQL）上。

postgresql postgresql-performance

Flo*_*nt2

lucky-day

4
推荐指数

1
解决办法

368
查看次数

select 查询的非确定性性能，在 10 亿行的表上从 1s 到 60s

我正在尝试调查为什么此查询的性能如此不确定。它可能需要 1 秒到 60 秒及以上的任何时间。查询的本质是选择一个“时间窗口”，并从该时间窗口内获取所有行。

这是有问题的查询，在大约 10 亿行的表上运行：

SELECT CAST(extract(EPOCH from ts)*1000000 as bigint) as ts
    , ticks
    , quantity
    , side
FROM order_book
WHERE ts >= TO_TIMESTAMP(1618882633073383/1000000.0)
    AND ts < TO_TIMESTAMP(1618969033073383/1000000.0)
    AND zx_prod_id = 0
ORDER BY ts ASC, del desc;

Run Code Online (Sandbox Code Playgroud)

这就是表的创建方式

CREATE TABLE public.order_book
(
    ts timestamp with time zone NOT NULL,
    zx_prod_id smallint NOT NULL,
    ticks integer NOT NULL,
    quantity integer NOT NULL,
    side boolean NOT NULL,
    del boolean NOT NULL
)

Run Code Online (Sandbox Code Playgroud)

TO_TIMESTAMP当我走整张桌子时，其中的值将继续向前滑动。以下是EXPLAIN ANALYZE两个不同时间窗口上相同查询的输出： …

postgresql cache explain timescaledb postgresql-performance

val*_*mit

2021 05-04

4
推荐指数

1
解决办法

91
查看次数

如何获得 OR'ed 时间范围谓词的索引扫描？

我有events包含字段的表：

id
user_id
time_start
time_end
...

Run Code Online (Sandbox Code Playgroud)

并在上有 B 树索引(time_start, time_end)。

SELECT user_id
FROM events
WHERE ((time_start <= '2021-08-24T15:30:00+00:00' AND time_end >= '2021-08-24T15:30:00+00:00') OR
       (time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id);

Run Code Online (Sandbox Code Playgroud)

Group  (cost=243735.42..243998.32 rows=1103 width=4) (actual time=186.533..188.244 rows=166 loops=1)
  Group Key: user_id
  Buffers: shared hit=224848
  ->  Gather Merge  (cost=243735.42..243992.80 rows=2206 width=4) (actual time=186.532..188.199 rows=176 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=224848
        ->  Sort  (cost=242735.39..242738.15 rows=1103 width=4) (actual time=184.121..184.126 rows=59 loops=3) …

Run Code Online (Sandbox Code Playgroud)

postgresql index range-types query-performance postgresql-performance

Dmi*_*tro

2021 08-27

4
推荐指数

1
解决办法

328
查看次数

在没有扩展的 PostgreSQL 中查找臃肿的表和索引

我的表在白天变化很大，大量数据被删除、修改和插入。

我怀疑这些表上的表和索引可能会臃肿。

我已经看到 PostgreSQL 的扩展选项可以检查这一点，但我想避免在我的数据库中创建扩展。

如何获取此信息（表/索引臃肿）而不必使用 PostgreSQL 扩展（例如：pgstattuple），仅使用本机 PostgreSQL 12 功能。

postgresql postgresql-performance

Tom*_*Tom

lucky-day

4
推荐指数

1
解决办法

1万
查看次数

静态大型 PostgreSQL 表的查询性能

我试图尽可能详细地说明这一点。抱歉长度！

背景

protein_snp_assoc我在 PostgreSQL（版本 12.13）数据库上创建了以下分区表：

CREATE TABLE protein_snp_assoc (
  protein_id    int not null,
  snp_id        int not null,
  beta          double precision,
  se            double precision,
  logp          double precision
) PARTITION BY RANGE (snp_id);

Run Code Online (Sandbox Code Playgroud)

然后，我根据以下模板创建了 51 个分区，每个分区包含大约 1.5 亿行（总共 76.5 亿行）：

CREATE TABLE IF NOT EXISTS protein_snp_assoc_(x) PARTITION OF protein_snp_assoc
  FOR VALUES FROM (y) TO (z);

Run Code Online (Sandbox Code Playgroud)

其中x范围从 1 到 51，并y, z定义间隔，每个长度为 150,000。例如，前两个和最后一个分区是：

protein_snp_assoc_1 FOR VALUES FROM (1) TO (150001),
protein_snp_assoc_2 FOR VALUES FROM (150001) TO (300001), ...
protein_snp_assoc_51 …

Run Code Online (Sandbox Code Playgroud)

postgresql database-design read-only-database query-performance postgresql-performance

jom*_*mmi

2023 03-13

4
推荐指数

1
解决办法

901
查看次数

标签统计

postgresql ×10

postgresql-performance ×10

performance ×4

index ×3

query-performance ×2

amazon-rds ×1

array ×1

cache ×1

database-design ×1

datatypes ×1

explain ×1

optimization ×1

postgresql-11 ×1

range-types ×1

read-only-database ×1

timescaledb ×1

update ×1

背景

标签 统计

标签统计