Postgres Materialized Path - What are the benefits of using ltree?

Question

Postgres Materialized Path - What are the benefits of using ltree?

Materialized Path is a method for representing hierarchy in SQL. Each node contains the path itself and all its ancestors (grandparent/parent/self).

The django-treebeard implementation of MP (docs):

Each step of the path is a fixed length for consistent performance.
Each node contains depth and numchild fields (fast reads at minimal cost to writes).
The path field is indexed (with a standard b-tree index):

The materialized path approach makes heavy use of LIKE in your database, with clauses like WHERE path LIKE '002003%'. If you think that LIKE is too slow, you’re right, but in this case the path field is indexed in the database, and all LIKE clauses that don’t start with a % character will use the index. This is what makes the materialized path approach so fast.

Implementation of get_ancestors (link):

Match nodes with a path that contains a subset of the current path (steplen is the fixed length of a step).

paths = [
    self.path[0:pos]
    for pos in range(0, len(self.path), self.steplen)[1:]
]
return get_result_class(self.__class__).objects.filter(
    path__in=paths).order_by('depth')

Run Code Online (Sandbox Code Playgroud)

Implementation of get_descendants (link):

Match nodes with a depth greater than self and a path which starts with current path.

return cls.objects.filter(
    path__startswith=parent.path,
    depth__gte=parent.depth
).order_by(
    'path'
)

Run Code Online (Sandbox Code Playgroud)

Potential downsides to this approach:

A deeply nested hierarchy will result in long paths, which can hurt read performance.
Moving a node requires updating the path of all descendants.

Postgres includes the ltree extension which provides a custom GiST index (docs).

I am not clear which benefits ltree provides over django-treebeard's implementation. This article argues that only ltree can answer the get_ancestors question, but as demonstrated earlier, figuring out the ancestors (or descendants) of a node is trivial.

[As an aside, if found this Django ltree library - https://github.com/mariocesar/django-ltree].

Both approaches use an index (django-treebeard uses b-tree, ltree uses a custom GiST). I am interested in understanding the implementation of the ltree GiST and why it might be a more efficient index than a standard b-tree for this particular use case (materialized path).

Additional links

What are the options for storing hierarchical data in a relational database?

https://news.ycombinator.com/item?id=709970

Answer 1

Pin*_*nyM 6

TL;DR可重用标签、复杂的搜索模式和针对多个后代节点（或尚未检索到路径的单个节点）的祖先搜索无法使用物化路径索引来完成。

对于那些对血腥细节感兴趣的人......

首先，只有当您没有在节点描述中重用任何标签时，您的问题才相关。如果是的话，l-tree 确实是两者中唯一的选择。但是物化路径实现通常不需要这个，所以让我们把它放在一边。

一个明显的区别在于 l-tree 为您提供的搜索类型的灵活性。考虑这些示例（来自ltree您问题中链接的文档）：

foo         Match the exact label path foo
*.foo.*     Match any label path containing the label foo
*.foo       Match any label path whose last label is foo

Run Code Online (Sandbox Code Playgroud)

第一个查询显然可以通过物化路径实现。最后一个也是可以实现的，您可以将查询调整为同级查找。但是，中间情况不能通过单个索引查找直接实现。您要么必须将其分解为两个查询（所有后代 + 所有祖先），要么求助于表扫描。

然后有像这样的非常复杂的查询（也来自文档）：

Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain

Run Code Online (Sandbox Code Playgroud)

物化路径索引在这里是无用的，需要进行全表扫描来处理这个问题。如果您想将此作为 SARGable 查询执行，l-tree 是唯一的选择。

但是对于标准的分层操作，找到以下任何一项：

父母
孩子们
后代
根节点
叶节点

物化路径与 l-tree 一样有效。与上面链接的文章相反，使用 b 树搜索一个共同祖先的所有后代是非常可行的。查询格式WHERE path LIKE 'A.%'是 SARGable，前提是您的索引准备妥当（我必须明确标记我的路径索引varchar_pattern_ops才能使其正常工作）。

此列表中缺少的是为后代查找所有祖先。WHERE 'A.B.C.D' LIKE path || '.%'不幸的是，查询格式不会使用索引。一个解决办法，一些图书馆实现是从路径解析出祖先节点，并直接对它们进行查询：WHERE id IN ('A', 'B', 'C')。但是，这仅在您针对已检索其路径的特定节点的祖先时才有效。l-tree 将在这一点上获胜。

归档时间：	6 年，3 月前
查看次数：	1700 次
最近记录：	6 年，3 月前