加速 jsonb 对象数组中嵌套的键值的范围测试

Question

加速 jsonb 对象数组中嵌套的键值的范围测试

Cep*_*pr0 4 sql arrays postgresql json jsonb

假设我有下parents表：

create table parents (
  id       integer not null constraint parents_pkey primary key,
  name     text    not null,
  children jsonb   not null
);

Run Code Online (Sandbox Code Playgroud)

其中children有一个如下结构的json数组：

[
    {
        "name": "child1",
        "age": 10
    }, 
    {
        "name": "child2",
        "age": 12
    } 
]

Run Code Online (Sandbox Code Playgroud)

例如，我需要找到所有有 10 岁到 12 岁孩子的父母。

我创建以下查询：

select distinct
  p.*
from
  parents p, jsonb_array_elements(p.children) c
where
  (c->>'age')::int between 10 and 12;

Run Code Online (Sandbox Code Playgroud)

当表parents很大时（例如 1M 条记录），它工作得很好，但速度很慢。我尝试在children字段上使用“杜松子酒”索引，但这没有帮助。

那么有没有办法加快此类查询的速度呢？或者也许还有另一种解决方案可以对嵌套 json 数组中的字段进行查询/索引？

查询计划：

Unique (cost=1793091.18..1803091.18 rows=1000000 width=306) (actual time=4070.866..5106.998 rows=399947 loops=1) -> Sort (cost=1793091.18..1795591.18 rows=1000000 width=306) (actual time=4070.864..4836.241 rows=497313 loops=1) Sort Key: p.id, p.children, p.name Sort Method: external merge Disk: 186040kB -> Gather (cost=1000.00..1406321.34 rows=1000000 width=306) (actual time=0.892..1354.147 rows=497313 loops=1) Workers Planned: 2 Workers Launched: 2 -> Nested Loop (cost=0.00..1305321.34 rows=416667 width=306) (actual time=0.162..1794.134 rows=165771 loops=3) -> Parallel Seq Scan on parents p (cost=0.00..51153.67 rows=416667 width=306) (actual time=0.075..239.786 rows=333333 loops=3) -> Function Scan on jsonb_array_elements c (cost=0.00..3.00 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1000000) Filter: ((((value ->> 'age'::text))::integer >= 10) AND (((value ->> 'age'::text))::integer <= 12)) Rows Removed by Filter: 3 Planning time: 0.218 ms Execution time: 5140.277 ms
Run Code Online (Sandbox Code Playgroud)

Answer 1

Erw*_*ter 5

第一个立即措施是让查询更快一些：

SELECT *
FROM   parents p
WHERE  EXISTS (
   SELECT FROM jsonb_array_elements(p.children) c
   WHERE (c->>'age')::int BETWEEN 10 AND 12
   );

Run Code Online (Sandbox Code Playgroud)

当多个数组对象匹配时，半连接避免了中间表中的行重复 - 以及外部查询中EXISTS的需要。DISTINCT ON但这只是稍微快一点。

核心问题是您想要测试一系列整数值，而现有jsonb运算符不提供此类功能。

有多种方法可以解决这个问题。不知道这些，这里有一个“智能”解决方案来解决给定的示例。技巧是将范围分割为不同的值并使用jsonb包含运算符@>：

SELECT *
FROM   parents p
WHERE (p.children @> '[{"age": 10}]'
OR     p.children @> '[{"age": 11}]'
OR     p.children @> '[{"age": 12}]');

Run Code Online (Sandbox Code Playgroud)

由 GIN 索引支持jsonb_path_ops：

CREATE INDEX parents_children_gin_idx ON parents USING gin (children jsonb_path_ops);

Run Code Online (Sandbox Code Playgroud)

但是，如果您的范围跨越了一大堆整数值，那么您将需要更通用的东西。与往常一样，最佳解决方案取决于完整的情况：数据分布、值频率、查询中的典型范围、可能的 NULL 值？、行大小、读/写模式、每个 jsonb值是否都有一个或多个匹配age键？...

与专门的、非常快的索引相关的答案：

使用更大的运算符在 jsonb 数组中搜索嵌套值

有关的：

归档时间：	7 年，10 月前
查看次数：	2124 次
最近记录：	7 年，10 月前