相关疑难解决方法(0)

加入联结表以实现高效排序/分页的推荐方法是什么？

简介：我有一个简单的数据库模式，但即使只有几十条记录，基本查询的性能也已经成为一个问题。

数据库：PostgreSQL 9.6

简化架构：

CREATE TABLE article (
  id bigint PRIMARY KEY,
  title text NOT NULL,
  score int NOT NULL
);
CREATE TABLE tag (
  id bigint PRIMARY KEY,
  name text NOT NULL
);
CREATE TABLE article_tag (
  article_id bigint NOT NULL REFERENCES article (id),
  tag_id bigint NOT NULL REFERENCES tag (id),
  PRIMARY KEY (article_id, tag_id)
);
CREATE INDEX ON article (score);

Run Code Online (Sandbox Code Playgroud)

生产数据信息：

所有表都是读/写的。写入量低，每几分钟左右只有一个新记录。

大概记录数：

~66K 篇文章
~63K 标签
~147K article_tags

每篇文章平均 5 个标签。

问题 …

postgresql performance join paging postgresql-performance

Jef*_*era

2020 01-08

6
推荐指数

1
解决办法

3124
查看次数

如何通过许多重复的 UNION 子查询来减少查询大小？

我使用 Postgres 13 并使用以下 DDL 定义了一个表：

CREATE TABLE item_codes (
    code    bytea                    NOT NULL,
    item_id bytea                    NOT NULL,
    time    TIMESTAMP WITH TIME ZONE NOT NULL,
    PRIMARY KEY (item_id, code)
);

CREATE INDEX ON item_codes (code, time, item_id);

Run Code Online (Sandbox Code Playgroud)

我使用以下查询：

SELECT DISTINCT time, item_id
FROM (
      (SELECT time, item_id
       FROM item_codes
       WHERE code = '\x3965623166306238383033393437613338373162313934383034366139653239'
       ORDER BY time, item_id
       LIMIT 100)
       UNION ALL
      (SELECT time, item_id
       FROM item_codes
       WHERE code = '\x3836653432356638366638636338393364373935343938303233343363373561'
       ORDER BY time, item_id
       LIMIT 100)
     ) AS items
ORDER …

Run Code Online (Sandbox Code Playgroud)

postgresql execution-plan union query-performance postgresql-performance

Vit*_*nko

2023 02-22

6
推荐指数

1
解决办法

608
查看次数

使用小 LIMIT 优化查询，对一列进行谓词并按另一列排序

我使用的是 Postgres 9.3.4，我有 4 个查询，它们的输入非常相似，但响应时间却大不相同：

查询#1

EXPLAIN ANALYZE SELECT posts.* FROM posts
WHERE posts.source_id IN (19082, 19075, 20705, 18328, 19110, 24965, 18329, 27600, 17804, 20717, 27598, 27599)
AND posts.deleted_at IS NULL
ORDER BY external_created_at desc
LIMIT 100 OFFSET 0;
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..585.44 rows=100 width=1041) (actual time=326092.852..507360.199 rows=100 loops=1)
   ->  Index Scan using index_posts_on_external_created_at on posts  (cost=0.43..14871916.35 rows=2542166 width=1041) (actual time=326092.301..507359.524 rows=100 loops=1)
         Filter: (source_id = ANY ('{19082,19075,20705,18328,19110,24965,18329,27600,17804,20717,27598,27599}'::integer[]))
         Rows Removed by Filter: 6913925
 Total runtime: 507361.944 ms

Run Code Online (Sandbox Code Playgroud)

查询#2

EXPLAIN …

Run Code Online (Sandbox Code Playgroud)

postgresql performance index optimization postgresql-9.3 postgresql-performance

god*_*yan

2020 01-08

5
推荐指数

1
解决办法

895
查看次数

Postgres 有时对 WHERE a IN (...) ORDER BY b LIMIT N 使用劣等索引

我们有一个包含约 50 亿行的 PostgreSQL 表，它养成了一个讨厌的习惯，即缺少正确的索引并对某些LIMIT操作进行主键扫描。

问题通常出现在一个ORDER BY .. LIMIT ..子句（Django 分页中的常见模式）上，其中LIMIT是索引匹配的结果的一些相对较小的子集。一个极端的例子是这样的：

SELECT * FROM mcqueen_base_imagemeta2 
  WHERE image_id IN ( 123, ... )
  ORDER BY id DESC
  LIMIT 1;

Run Code Online (Sandbox Code Playgroud)

其中该IN子句中的项目约为 20，索引匹配的总行数image_id为 16。

在EXPLAIN表明，它错过了image_id指数，而是确实5B行的PK扫描：

限制（成本=0.58..4632.03 行=1 宽度=28）
   -> 在 mcqueen_base_imagemeta2 上使用 mcqueen_base_imagemeta2_pkey 向后扫描索引（成本=0.58..364597074.75 行=78722 宽度=28）
         过滤器：(image_id = ANY ('{123, ...}'::bigint[]))

如果LIMIT增加到2，它会按预期工作：

限制（成本=7585.92..7585.93 行=2 宽度=28）
   -> 排序（成本=7585.92..7782.73 行=78722 宽度=28）
         排序键：id DESC
         -> 在 mcqueen_base_imagemeta2 上使用 …

postgresql performance index-tuning paging postgresql-9.6 query-performance

Arn*_*sen

2020 01-08

5
推荐指数

2
解决办法

531
查看次数

索引定义顺序和 ORDER BY 子句

所以我早上正在阅读博客，偶然发现了这个有趣的练习：

https://www.erikdarlingdata.com/sql-server/lets-design-an-index-together-part-3/

这是文章中的相关查询和他建议的索引。

SELECT TOP (5000)
       p.LastActivityDate,
       p.PostTypeId,
       p.Score,
       p.ViewCount
FROM dbo.Posts AS p
WHERE p.PostTypeId = 1
AND   p.LastActivityDate >= '20110101'
ORDER BY p.Score DESC;

CREATE INDEX whatever 
    ON dbo.Posts(PostTypeId, Score DESC, LastActivityDate) 
        INCLUDE(ViewCount) WITH (DROP_EXISTING = ON);

Run Code Online (Sandbox Code Playgroud)

非常有趣的构建和索引，并尝试相应地调整它。但是，我之前可能存在误解，认为索引键顺序很重要，并且当索引键顺序与查询不匹配时，某些 WHERE 子句可能不会使用某些索引。这意味着，我缺乏列出的特定场景的经验，我假设的想法是该查询不会使用该索引，因为 Score 位于索引键定义的中间，但不在查询的 where 子句中。

当优化器决定使用哪个索引时，ORDER BY 列是否会被评估，并且只要 WHERE 子句列和 ORDER by 列在索引定义中，那么它就会使用它？

我想我的问题更多是关于优化器如何评估 WHERE 子句和 ORDER BY 子句的索引。

index sql-server order-by

Dou*_*ats

2021 09-18

5
推荐指数

1
解决办法

492
查看次数

优化对两个大表的查询

我的系统中有一个非常重要的查询，由于表上的数据量很大，执行时间太长。我是一名初级 DBA，我需要为此进行最佳优化。每个表大约有 8000 万行。

表是：

tb_pd：

   Column            |  Type   | Modifiers | Storage | Stats target | Description 
---------------------+---------+-----------+---------+--------------+-------------
 pd_id               | integer | not null  | plain   |              | 
 st_id               | integer |           | plain   |              | 
 status_id           | integer |           | plain   |              | 
 next_execution_date | bigint  |           | plain   |              | 
 priority            | integer |           | plain   |              | 
 is_active           | integer |           | plain   |              | 
Indexes:
    "pk_pd" PRIMARY KEY, btree (pd_id)
    "idx_pd_order" btree (priority, next_execution_date) …

Run Code Online (Sandbox Code Playgroud)

postgresql optimization index-tuning query-performance

Iva*_*Paz

2020 01-08

4
推荐指数

1
解决办法

5466
查看次数

实时性远大于“EXPLAIN ANALYZE”的执行时间（索引扫描）

我想根据 ID 获取最多 100 行。id 是表的主键。

我编写的查询如下所示：

select * from table where id = any ($1);

Run Code Online (Sandbox Code Playgroud)

其中$1被插值为 ids 数组。

使用时EXPLAIN ANALYZE我得到以下计划（解释链接）：

                                                                QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..44.98 rows=17 width=553) (actual time=100.048..834.209 rows=17 loops=1)
   ->  Index Scan using instagram_id_index_1000 on profiles_1000  (cost=0.43..44.98 rows=17 width=553) (actual time=100.046..834.163 rows=17 loops=1)
         Index Cond: (id = ANY ('{34491540,28977916,33241270,33609141,31043380,29364420,30247037,33311491,36267571,32886281,32366574,32569254,33038689,31089076,29416100,30455309,31570597}'::integer[]))
 Planning time: 424.512 ms
 Execution time: 834.280 ms
(5 rows)

Run Code Online (Sandbox Code Playgroud)

当我实际执行它时（使用\timing），我得到的结果在 2-5 秒范围内！我真的无法接受如此糟糕的表现。EXPLAIN ANALYZE首先提供的执行时间就已经很长了。

一些上下文：
1）数据库是本地的，所以没有网络延迟
2）我查询的表是物化视图
3）我也尝试了 …

postgresql performance query-performance

rub*_*bik

2020 01-08

3
推荐指数

1
解决办法

7578
查看次数

MySql Order by isnull() 性能问题

我下面的 sql 用于列出 10 天前添加的股票。Order by isnull(Price) 被使用，这样没有任何价格的股票仍然会被列出。

AddDate 和 Price 有一个索引。

SELECT Id, Price FROM tblStock
where AddDate >= date_sub(curdate(),interval 10 day)
order by isnull(Price), Price asc limit 50

Run Code Online (Sandbox Code Playgroud)

解释 sql 显示它没有使用价格指数。所以我试图改进查询并提出了以下 sql

SELECT Id, Price FROM tblStock
where AddDate >= date_sub(curdate(),interval 10 day)
and Price is not null
order by Price asc limit 50

Run Code Online (Sandbox Code Playgroud)

新的 sql 运行得更快，并且解释显示它使用价格索引，但问题是永远不会选择具有空值的价格。

寻找有关如何解决此问题的任何意见或建议。谢谢。

mysql performance index null order-by

MyS*_*bie

lucky-day

2
推荐指数

1
解决办法

4674
查看次数

标签统计

postgresql ×6

performance ×5

query-performance ×4

index ×3

postgresql-performance ×3

index-tuning ×2

optimization ×2

order-by ×2

paging ×2

execution-plan ×1

join ×1

mysql ×1

null ×1

postgresql-9.3 ×1

postgresql-9.6 ×1

sql-server ×1

union ×1

查询#1

查询#2

标签 统计

标签统计