相关疑难解决方法(0)

优化对一系列时间戳的查询（两列）

我在 Ubuntu 12.04 上使用 PostgreSQL 9.1。

我需要在一个时间范围内选择记录：我的表time_limits有两个timestamp字段和一个integer属性。我的实际表中还有其他列与此查询无关。

create table (
   start_date_time timestamp,
   end_date_time timestamp, 
   id_phi integer, 
   primary key(start_date_time, end_date_time,id_phi);

Run Code Online (Sandbox Code Playgroud)

该表包含大约 200 万条记录。

像下面这样的查询花费了大量的时间：

select * from time_limits as t 
where t.id_phi=0 
and t.start_date_time <= timestamp'2010-08-08 00:00:00'
and t.end_date_time   >= timestamp'2010-08-08 00:05:00';

Run Code Online (Sandbox Code Playgroud)

所以我尝试添加另一个索引 - PK的倒数：

create index idx_inversed on time_limits(id_phi, start_date_time, end_date_time);

Run Code Online (Sandbox Code Playgroud)

我的印象是性能有所提高：访问表中间记录的时间似乎更合理：介于 40 到 90 秒之间。

但是对于时间范围中间的值，它仍然是几十秒。在针对表格末尾时（按时间顺序），还有两次。

我explain analyze第一次尝试得到这个查询计划：

 Bitmap Heap Scan on time_limits  (cost=4730.38..22465.32 rows=62682 width=36) (actual time=44.446..44.446 rows=0 loops=1)
   Recheck …

Run Code Online (Sandbox Code Playgroud)

postgresql index optimization explain postgresql-9.1

Ste*_*and

2016 03-25

129
推荐指数

2
解决办法

13万
查看次数

复合索引是否也适用于第一个字段的查询？

假设我有一个包含字段A和的表B。我在A+上进行常规查询B，所以我在上创建了一个复合索引(A,B)。A复合索引是否也会对查询进行全面优化？

此外，我在上创建了一个索引A，但 Postgres 仍然只使用复合索引来查询A。如果前面的答案是肯定的，我想这并不重要，但是为什么它默认选择复合索引，如果单个A索引可用？

postgresql performance index database-design index-tuning

Luc*_*ano

2014 09-04

104
推荐指数

1
解决办法

4万
查看次数

在 PostgreSQL 中使用索引

我有几个关于在 PostgreSQL 中使用索引的问题。我有一个Friends带有以下索引的表：

   Friends ( user_id1 ,user_id2)

Run Code Online (Sandbox Code Playgroud)

user_id1并且user_id2是user表的外键

这些是等价的吗？如果不是，那为什么？
```
Index(user_id1,user_id2) and Index(user_id2,user_id1)
```
Run Code Online (Sandbox Code Playgroud)
如果我创建主键(user_id1,user_id2)，它会自动为它创建索引吗？

如果第一个问题中的索引不相等，那么在上面的主键命令上创建了哪个索引？

postgresql index primary-key

cod*_*ool

2017 05-30

85
推荐指数

5
解决办法

3万
查看次数

为读取性能配置 PostgreSQL

我们的系统写入了大量数据（一种大数据系统）。写入性能足以满足我们的需求，但读取性能真的太慢了。

我们所有表的主键（约束）结构都相似：

timestamp(Timestamp) ; index(smallint) ; key(integer).

Run Code Online (Sandbox Code Playgroud)

一个表可以有数百万行，甚至数十亿行，而一个读请求通常是针对特定时间段（时间戳/索引）和标记的。查询返回大约 20 万行是很常见的。目前，我们每秒可以读取大约 15k 行，但我们需要快 10 倍。这是可能的，如果是，如何？

注意： PostgreSQL 是和我们的软件一起打包的，所以不同客户端的硬件是不一样的。

它是一个用于测试的虚拟机。VM 的主机是具有 24.0 GB RAM 的 Windows Server 2008 R2 x64。

服务器规范（虚拟机 VMWare）

Server 2008 R2 x64
2.00 GB of memory
Intel Xeon W3520 @ 2.67GHz (2 cores)

Run Code Online (Sandbox Code Playgroud)

`postgresql.conf` 优化

shared_buffers = 512MB (default: 32MB)
effective_cache_size = 1024MB (default: 128MB)
checkpoint_segment = 32 (default: 3)
checkpoint_completion_target = 0.9 (default: 0.5)
default_statistics_target = 1000 (default: 100)
work_mem = 100MB (default: 1MB)
maintainance_work_mem = 256MB …

Run Code Online (Sandbox Code Playgroud)

postgresql performance postgresql-9.1 query-performance

JPe*_*ier

2020 01-08

47
推荐指数

2
解决办法

4万
查看次数

优化大表上的 LATERAL JOIN 查询

我正在使用 Postgres 9.5。我有一个记录来自多个网站的页面点击量的表格。该表包含从 2016 年 1 月 1 日到 2016 年 6 月 30 日的大约 3200 万行。

CREATE TABLE event_pg (
   timestamp_        timestamp without time zone NOT NULL,
   person_id         character(24),
   location_host     varchar(256),
   location_path     varchar(256),
   location_query    varchar(256),
   location_fragment varchar(256)
);

Run Code Online (Sandbox Code Playgroud)

我正在尝试调整一个查询，该查询计算执行给定页面命中序列的人数。该查询旨在回答诸如“有多少人查看了主页，然后访问了帮助站点，然后查看了感谢页面”之类的问题？结果看起来像这样

?????????????????????????????????????????
?  home-page ? help site  ? thankyou    ?
?????????????????????????????????????????
? 10000      ? 9800       ?1500         ?
?????????????????????????????????????????

Run Code Online (Sandbox Code Playgroud)

请注意数字正在减少，这是有道理的，因为查看主页的 10000 人 9800 继续访问了帮助站点，而其中 1500 人继续点击了感谢页面。

3 步序列的 SQL 使用横向连接，如下所示：

SELECT 
  sum(view_homepage) AS view_homepage,
  sum(use_help) AS use_help,
  sum(thank_you) AS thank_you
FROM ( …

Run Code Online (Sandbox Code Playgroud)

postgresql performance optimization greatest-n-per-group postgresql-performance

max*_*ire

2020 01-08

9
推荐指数

1
解决办法

8114
查看次数

RDS 上非常慢的简单 PostgreSQL 查询

我似乎在中型 RDS 盒子（db.m3.medium，3.7gb ram）上的查询速度很慢。

这是一个包含 4,152,928 行的表格。

select sum(some_field) c
from pages
where pages.some_id=123
and pages.first_action_at > '2014-01-01 00:00:00 +1000'

Run Code Online (Sandbox Code Playgroud)

总运行时间：45031 毫秒。
在本地，我有大约 110 万行，同样的查询需要大约 450 毫秒。

这是查询计划，来自解释：

Aggregate  (cost=475640.59..475640.60 rows=1 width=4)
   ->  Seq Scan on pages  (cost=0.00..475266.07 rows=149809 width=4)
         Filter: ((first_action_at > '2014-01-01 00:00:00'::timestamp without time zone) 
                AND (some_id = 447))

Run Code Online (Sandbox Code Playgroud)

这是来自解释分析的回应：

 Aggregate  (cost=475641.74..475641.76 rows=1 width=4) (actual time=42419.717..42419.718 rows=1 loops=1)
   ->  Seq Scan on pages  (cost=0.00..475267.22 rows=149810 width=4) (actual time=0.013..42265.908 rows=141559 loops=1)
    Filter: ((first_action_at > '2014-01-01 00:00:00'::timestamp without time …

Run Code Online (Sandbox Code Playgroud)

postgresql performance index index-tuning postgresql-performance

eas*_*yjo

2020 01-08

7
推荐指数

1
解决办法

1万
查看次数

可扩展查询前 x 天内的事件运行计数

我已经在stackoverflow上发布了这个问题，但我想我可能会在这里得到更好的答案。
我有一个表存储用户发生的数百万个事件：

                                       Table "public.events"
   Column   |           Type           |                         Modifiers                         
------------+--------------------------+-----------------------------------------------------------
 event_id   | integer                  | not null default nextval('events_event_id_seq'::regclass)
 user_id    | bigint                   | 
 event_type | integer                  | 
 ts         | timestamp with time zone |

Run Code Online (Sandbox Code Playgroud)

event_type 有 5 个不同的值、数百万用户以及每个用户每个 event_type 的不同事件数，通常范围为 1 到 50。

数据样本：

+-----------+----------+-------------+----------------------------+
| event_id  | user_id  | event_type  |         timestamp          |
+-----------+----------+-------------+----------------------------+
|        1  |       1  |          1  | January, 01 2015 00:00:00  |
|        2  |       1  |          1  | January, 10 2015 00:00:00  | …

Run Code Online (Sandbox Code Playgroud)

postgresql performance scalability window-functions postgresql-performance

Jul*_*don

2020 01-08

5
推荐指数

1
解决办法

3210
查看次数

如何使这个查询使用我的多列索引？

目前，我有一个定义如下的视图：

                       View "public.customer_list"
  Column   |          Type           | Modifiers | Storage  | Description 
-----------+-------------------------+-----------+----------+-------------
 id        | bigint                  |           | plain    | 
 name      | character varying(100)  |           | extended | 
 street    | character varying(100)  |           | extended | 
 zip       | character varying(10)   |           | extended | 
 city      | character varying(100)  |           | extended | 
 country   | character varying(3)    |           | extended | 
 phone     | character varying(100)  |           | extended | 
 mail      | character varying(100)  |           | extended | 
 rating    | integer                 | …

Run Code Online (Sandbox Code Playgroud)

postgresql performance index postgresql-performance

Chr*_*itt

2020 01-08

5
推荐指数

1
解决办法

1001
查看次数

为什么 10,000 个 ID 的列表比使用等效的 SQL 来选择它们的性能更好？

我有一个带有遗留查询的 Rails 应用程序，我想对其进行翻新。当前实现执行两个 SQL 查询：一个获取大量 ID，第二个查询使用这些 ID 并应用一些额外的连接和过滤器来获得所需的结果。

我试图用避免往返的单个查询替换它，但这样做会导致我的本地测试环境（这是完整生产数据集的副本）的性能大幅下降。新查询中似乎没有使用索引，导致全表扫描。我曾希望单个查询能够保持与原始代码相同的性能，理想情况下，由于不需要发送所有 ID，因此可以对其进行改进。

这是我实际问题的最小化版本。稍大一点的版本在讨论为什么10000个ID的列表中一个复杂的查询有更好的表现与多个热膨胀系数相比，相当于SQL选择它们？.

当前查询

有一个查询需要大约 6.5 秒来计算 10000 多个 ID 的列表。您可以visible_projects在下面的“建议查询”部分中将其视为 CTE 。然后将这些 ID 输入到此查询中：

EXPLAIN (ANALYZE, BUFFERS)
WITH visible_projects AS NOT MATERIALIZED (
    SELECT
        id
    FROM
        "projects"
    WHERE
        "projects"."id" IN (
            -- 10000+ IDs removed
)),
visible_tasks AS MATERIALIZED (
    SELECT
        tasks.id
    FROM
        tasks
    WHERE
        tasks.project_id IN (
            SELECT
                id
            FROM
                visible_projects))
SELECT
    COUNT(1)
FROM
    visible_tasks;

Run Code Online (Sandbox Code Playgroud)

查询计划（depesz）

Aggregate  (cost=1309912.31..1309912.32 rows=1 width=8) (actual time=148.661..153.739 …

Run Code Online (Sandbox Code Playgroud)

postgresql performance query-performance postgresql-performance

She*_*ter

2020 12-23

5
推荐指数

1
解决办法

113
查看次数

优化简单 SELECT 查询的缓慢性能

我有一个名为“链接”的应用程序，其中 1) 用户聚集在群组中并添加其他人，2) 在上述群组中为彼此发布内容。组由links_group我的 postgresql 9.6.5 DB 中的表定义，而他们在这些中发布的回复由links_reply表定义。总体而言，DB 的性能非常好。

然而SELECT，links_reply表上的一个查询始终显示在slow_log 中。它花费的时间超过 500 毫秒，并且比我在大多数其他 postgresql 操作中遇到的速度慢约 10 倍。

我使用 Django ORM 来生成查询。这里的ORM电话：replies = Reply.objects.select_related('writer__userprofile').filter(which_group=group).order_by('-submitted_on')[:25]。本质上，这是为给定的组对象选择最新的 25 条回复。它还选择关联user和userprofile对象。

这是我的慢日志中相应 SQL 的示例：LOG: duration: 8476.309 ms 语句：

SELECT

    "links_reply"."id",             "links_reply"."text", 
    "links_reply"."which_group_id", "links_reply"."writer_id",
    "links_reply"."submitted_on",   "links_reply"."image",
    "links_reply"."device",         "links_reply"."category", 

    "auth_user"."id",               "auth_user"."username", 

    "links_userprofile"."id",       "links_userprofile"."user_id",
    "links_userprofile"."score",    "links_userprofile"."avatar" 

FROM 

    "links_reply" 
    INNER JOIN "auth_user" 
        ON ("links_reply"."writer_id" = "auth_user"."id") 
    LEFT OUTER JOIN "links_userprofile" 
        ON ("auth_user"."id" = "links_userprofile"."user_id") 
WHERE …

Run Code Online (Sandbox Code Playgroud)

postgresql performance upgrade postgresql-9.6 query-performance

Has*_*aig

2020 01-08

4
推荐指数

1
解决办法

1万
查看次数

PostgreSQL 查询成本高

我有一个包含超过 10.000.000 条记录的表，并且我正在创建一个返回大约 4436 条记录的查询。

碰巧它给我的印象是获取最后一条记录的查询成本非常高。

Index Scan using idx_name on task  (cost=0.28..142102.57 rows=3470 width=34) (actual time=14.690..22.894 rows=4436 loops=1)
"  Index Cond: ((situation = ANY ('{0,1,2,3,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}'::integer[])) AND (deadline < CURRENT_TIMESTAMP))"
Planning Time: 1.335 ms
JIT:
  Functions: 5
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 1.654 ms, Inlining 0.000 ms, Optimization 1.214 ms, Emission 13.163 ms, Total 16.030 ms
Execution Time: 24.758 ms

Run Code Online (Sandbox Code Playgroud)

这个成本水平是否可以接受，或者这个指标是否需要改进？

指数：

CREATE INDEX idx_name ON task (situation, deadline, approved)
WHERE
deadline IS NOT …

Run Code Online (Sandbox Code Playgroud)

postgresql postgresql-12 postgresql-performance

Tom*_*Tom

lucky-day

1
推荐指数

1
解决办法

4227
查看次数

标签统计

postgresql ×11

performance ×8

postgresql-performance ×6

index ×5

query-performance ×3

index-tuning ×2

optimization ×2

postgresql-9.1 ×2

database-design ×1

explain ×1

greatest-n-per-group ×1

postgresql-12 ×1

postgresql-9.6 ×1

primary-key ×1

scalability ×1

upgrade ×1

window-functions ×1

服务器规范（虚拟机 VMWare）

postgresql.conf 优化

当前查询

标签 统计

`postgresql.conf` 优化

标签统计