在 Postgres 中，如何匹配多个“标签”以获得最佳性能？

Question

在 Postgres 中，如何匹配多个“标签”以获得最佳性能？

Ele*_*ct2 3 sql postgresql relational-division

表：文章

+--------+------+------------+
| id     | title|    created |
+--------+------+------------+
|    201 | AAA  | 1482561011 |
|    202 | BBB  | 1482561099 |
|    203 | CCC  | 1482562188 |
+--------+------+------------+

Run Code Online (Sandbox Code Playgroud)

表：标签

+-----------+------+
| articleid | tagid|
+-----------+------+
|    201    | 11   |
|    201    | 12   |
|    202    | 11   |
|    202    | 13   |
|    202    | 14   |
+-----------+------+

Run Code Online (Sandbox Code Playgroud)

现在，如果给定3 个标签 id，选择每篇文章同时匹配 3 个标签 id 的最新 10 篇文章的最佳索引设计和查询是什么？
我知道有几种方法可以做到，但我关心的是性能，考虑到每个标签中可能有数万篇文章

Answer 1

小智 11

正如a_horse_with_no_name 所提到的，这篇博文有一些非常有趣的性能基准，用于查找匹配多个标签的行：

http://www.databasesoup.com/2015/01/tag-all-things.html

将标签存储在主表的数组列中并创建 GIN-index 允许像这样选择行，而无需任何连接：

select id
from articles
where tags @> array[11,13,14]
order by created desc
limit 10;

Run Code Online (Sandbox Code Playgroud)

列和索引可以这样创建：

alter table articles add column tags text[] not null default '{}';
create index tags_index on articles using gin (tags);

Run Code Online (Sandbox Code Playgroud)

根据博客，使用数组列查找匹配两个标签的行比加入标签表时快 8 到 895 倍。

自2015年以来有变化吗？ (8认同)
最好看看最新版本 PG 的新测试。有没有相关的引用？ (2认同)

Answer 2

Phi*_*zou 1

您需要有一个索引articles.created用于排序，另一个唯一索引用于taggings(articleid, tagid)查询：

CREATE INDEX ON articles(created);
CREATE UNIQUE INDEX ON taggings(articleid, tagid);

Run Code Online (Sandbox Code Playgroud)

taggings然后只需使用三个表别名进行选择查询：

SELECT a.* FROM articles a, taggings t1, taggings t2, taggings t3
    WHERE a.id=t1.articleid AND a.id=t2.articleid AND a.id=t3.articleid
    AND t1.tagid=111 AND t2.tagid=222 AND t3.tagid=333
    ORDER BY created DESC LIMIT 10;

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，1 月前
查看次数：	7138 次
最近记录：	5 年，5 月前