如何最有效地找出记录是否有子记录?

The*_*rer 7 postgresql performance performance-tuning

我正在编写一个从parent表中返回单个记录的查询。如果它有任何孩子,我也想在这个查询中返回。这是一对多的关系。

parent:
 -parent_id
 -name

child:
-child_id
-name
-parent_id
Run Code Online (Sandbox Code Playgroud)

我的第一直觉是编写以下查询:

select name, (select count(child_id) from child c  where c.parent_id=p.parent_id) children
     from parent p
     where name like 'some name'
Run Code Online (Sandbox Code Playgroud)

但我想知道是否有更有效的方法来做到这一点,因为我实际上并不关心计数,只关心它是否有孩子。任何指针?

Col*_*art 8

不要忘记 Postgres 有一个布尔数据类型。以下是表达查询的最简洁方式:

select
  parent_id,
  name,
  exists (select from child where parent_id = p.parent_id) as has_children
from parent p;
Run Code Online (Sandbox Code Playgroud)

https://dbfiddle.uk/?rdbms=postgres_10&fiddle=86748ba18ba8c0f31f1b77a74230f67b


Eva*_*oll 6

方法

聚合方法

我们将其称为聚合方法的流行方式。Notebool_or(child_id IS NOT NULL)也可以使用,但速度并不快。

SELECT parent_id, count(*)>1 AS has_children
FROM parent
LEFT OUTER JOIN children
  USING (parent_id)
GROUP BY parent_id;
Run Code Online (Sandbox Code Playgroud)

LEFT JOIN LATERAL 有限制

但你也可以试试这个,LEFT JOIN LATERAL()像这样..

SELECT parent_id, has_children
FROM parent AS p
LEFT JOIN LATERAL (
  SELECT true
  FROM children AS c
  WHERE c.parent_id = p.parent_id
  FETCH FIRST ROW ONLY
) AS t(has_children)
  ON (true);
Run Code Online (Sandbox Code Playgroud)

EXISTS

仅供参考,您也可以使用CROSS JOIN LATERALwith EXISTS(我相信它是如何计划的)。我们将其称为EXISTS 方法

SELECT parent_id, has_children
FROM parent AS p
CROSS JOIN LATERAL (
  SELECT EXISTS(
    SELECT 
    FROM children AS c
    WHERE c.parent_id = p.parent_id
  )
) AS t(has_children);
Run Code Online (Sandbox Code Playgroud)

这与,

SELECT parent_id, EXISTS(
    SELECT 
    FROM children AS c
    WHERE c.parent_id = p.parent_id
) AS has_children
FROM parent AS p;
Run Code Online (Sandbox Code Playgroud)

基准

样本数据集

100万个孩子,2500个家长。我们的模拟市民完成了。

CREATE TABLE parent (
  parent_id int PRIMARY KEY
);
INSERT INTO parent
  SELECT x
  FROM generate_series(1,1e4,4) AS gs(x);
CREATE TABLE children (
  child_id int PRIMARY KEY,
  parent_id int REFERENCES parent
);
INSERT INTO children
  SELECT x, 1 + (x::int%1e4)::int/4*4
  FROM generate_series(1,1e6) AS gs(x);

VACUUM FULL ANALYZE children;
VACUUM FULL ANALYZE parent;
Run Code Online (Sandbox Code Playgroud)

结果 (pt1)

  • 聚合方法:450ms,
  • LEFT JOIN LATERAL ( FETCH FIRST ROW ONLY ): 850 毫秒
  • EXISTS方法:850ms

结果(添加索引并再次运行)

现在让我们添加一个索引

CREATE INDEX ON children (parent_id);
ANALYZE children;
Run Code Online (Sandbox Code Playgroud)

现在时序配置文件完全不同

  • 聚合方法:450ms,
  • LEFT JOIN LATERAL ( FETCH FIRST ROW ONLY ): 30 毫秒
  • EXISTS方法:30ms