Jef*_*man 5 postgresql tree cte count
我在 PostgreSQL 9.4 中存储了一个典型的树结构作为邻接列表:
gear_category (
id INTEGER PRIMARY KEY,
name TEXT,
parent_id INTEGER
);
Run Code Online (Sandbox Code Playgroud)
以及附加到类别的项目列表:
gear_item (
id INTEGER PRIMARY KEY,
name TEXT,
category_id INTEGER REFERENCES gear_category
);
Run Code Online (Sandbox Code Playgroud)
任何类别都可以附加装备物品,而不仅仅是树叶。
出于速度原因,我想预先计算有关每个类别的一些数据,我将使用这些数据来生成物化视图。
期望的输出:
speedy_materialized_view (
gear_category_id INTEGER,
count_direct_child_items INTEGER,
count_recursive_child_items INTEGER
);
Run Code Online (Sandbox Code Playgroud)
count_recursive_child_items是附加到当前类别或任何子类别的 GearItems 的累积数量。每个类别应占一行,任何为 0 的计数都为 0。
为了计算这个,我们需要使用递归 CTE 来遍历树:
WITH RECURSIVE children(id, parent_id) AS (
--base case
SELECT gear_category.id AS id, gear_category.parent_id AS parent_id
FROM gear_category
WHERE gear_category.id = 37 -- setting this to id includes current object
-- setting to parent_id includes only children
--combine with recursive part
UNION ALL
SELECT gear_category.id AS gear_category_id
, gear_category.parent_id AS gear_category_parent_id
FROM gear_category, children
WHERE children.id = gear_category.parent_id
)
TABLE children;
Run Code Online (Sandbox Code Playgroud)
计算附加到此子类别列表的子装备项目很简单:
--Subselect variant
SELECT count(gear_item.id) AS count_recursive_child_items_for_single_cat
FROM gear_item
WHERE gear_item.category_id IN (
SELECT children.id AS children_id
FROM children);
-- JOIN variant
SELECT count(gear_item.id) AS count_recursive_child_items_for_single_cat
FROM gear_item, children
WHERE gear_item.category_id = children.id;
Run Code Online (Sandbox Code Playgroud)
但如果您查看 CTE,就会发现我已对起始类别 ID“37”进行了硬编码。我不知道如何组合这些查询来生成所有类别的 count_recursive_child_items ,而不仅仅是单个类别。
我如何结合这些?
另外,目前对于每个类别,我都会计算所有子类别,这会产生大量重复的工作,并且我不确定如何删除它。例如,假设我有祖父母>父母>叶子。目前,我分别计算祖父母和父母的子类别,这意味着我两次计算父母>叶子关系。
由于我已经返回了count_direct_child_items每个类别,因此在计算时使用这些类别可能count_recursive_child_items比像我现在那样从头开始计算会更快。
单独来看,这些概念中的每一个对我来说都是有意义的。我只是不知道如何将它们组合成一个优雅/优化的查询。
这完成了工作:
CREATE MATERIALIZED VIEW speedy_materialized_view AS
WITH RECURSIVE tree AS (
SELECT id, parent_id, ARRAY[id] AS path
FROM gear_category
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id, path || c.id
FROM tree t
JOIN gear_category c ON c.parent_id = t.id
)
, tree_ct AS (
SELECT t.id, t.path, COALESCE(i.item_ct, 0) AS item_ct
FROM tree t
LEFT JOIN (
SELECT category_id AS id, count(*) AS item_ct
FROM gear_item
GROUP BY 1
) i USING (id)
)
SELECT t.id
, t.item_ct AS count_direct_child_items
, sum(t1.item_ct) AS count_recursive_child_items
FROM tree_ct t
LEFT JOIN tree_ct t1 ON t1.path[1:array_upper(t.path, 1)] = t.path
GROUP BY t.id, t.item_ct;
Run Code Online (Sandbox Code Playgroud)
count_recursive_child_items每个类别都是单独计算的,所以我不相信这是深度树最快的方法。
但是,递归 CTE 中不允许使用聚合函数。
严格来说,它确实是迭代的- 但“递归”CTE 也是如此。
您可以构建一个使用临时表的函数。您需要了解 plpgsql 的方法,否则有太多需要解释的内容。
CREATE OR REPLACE FUNCTION f_tree_ct()
RETURNS TABLE (id int, count_direct_child_items int, count_recursive_child_items int)
LANGUAGE plpgsql AS
$func$
DECLARE
_lvl int;
BEGIN
-- basic table with added path and count
CREATE TEMP TABLE t1 AS
WITH RECURSIVE tree AS (
SELECT c.id, c.parent_id, '{}'::int[] AS path, 0 AS lvl
FROM gear_category c
WHERE c.parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id, path || c.parent_id, lvl + 1
FROM tree t
JOIN gear_category c ON c.parent_id = t.id
)
, tree_ct AS (
SELECT t.id, t.parent_id, t.path, t.lvl, COALESCE(i.item_ct, 0) AS item_ct
FROM tree t
LEFT JOIN (
SELECT i.category_id AS id, count(*)::int AS item_ct
FROM gear_item i
GROUP BY 1
) i USING (id)
)
TABLE tree_ct;
-- CREATE INDEX ON t1 (lvl); -- only for very deep trees
SELECT INTO _lvl max(lvl) FROM t1; -- identify max lvl to start bottom up
-- recursively aggregate each level in 2nd temp table
CREATE TEMP TABLE t2 AS
SELECT t1.id, t1.parent_id, t1.lvl
, t1.item_ct
, t1.item_ct AS sum_ct
FROM t1
WHERE t1.lvl = _lvl;
IF _lvl > 0 THEN
FOR i IN REVERSE _lvl .. 1 LOOP
INSERT INTO t2
SELECT t1.id, t1.parent_id, t1.lvl, t1.item_ct
, CASE WHEN t2.sum_ct IS NULL THEN t1.item_ct ELSE t1.item_ct + t2.sum_ct END
FROM t1
LEFT JOIN (
SELECT t2.parent_id AS id, sum(t2.sum_ct) AS sum_ct
FROM t2
WHERE t2.lvl = i
GROUP BY 1
) t2 USING (id)
WHERE t1.lvl = i - 1;
END LOOP;
END IF;
RETURN QUERY -- only requested columns, unsorted
SELECT t2.id, t2.item_ct, t2.sum_ct FROM t2;
DROP TABLE t1, t2; -- to allow repeated execution in one transaction
RETURN;
END
$func$;
Run Code Online (Sandbox Code Playgroud)
CREATE MATERIALIZED VIEW由于使用了临时表,因此不能将其包含在语句中。您可以用它创建另一个(临时)表,充当手动维护的“物化视图”:
CREATE TABLE speedy_materialized_view AS
SELECT * FROM f_tree_ct();
Run Code Online (Sandbox Code Playgroud)
或者,您可以TRUNCATE speedy_materialized_view在函数中直接写入。该函数会RETURNS void相反,或者您可以返回一些元信息,例如行数......
另外:
CTE 的递归项中的列别名仅用于文档,因为输出列名称仅由非递归项确定。
| 归档时间: |
|
| 查看次数: |
1995 次 |
| 最近记录: |