Postgres Materialized Path - What are the benefits of using ltree?

Avr*_*zky 5 postgresql hierarchy ltree

Materialized Path is a method for representing hierarchy in SQL. Each node contains the path itself and all its ancestors (grandparent/parent/self).

The django-treebeard implementation of MP (docs):

  1. Each step of the path is a fixed length for consistent performance.

  2. Each node contains depth and numchild fields (fast reads at minimal cost to writes).

  3. The path field is indexed (with a standard b-tree index):

    The materialized path approach makes heavy use of LIKE in your database, with clauses like WHERE path LIKE '002003%'. If you think that LIKE is too slow, you’re right, but in this case the path field is indexed in the database, and all LIKE clauses that don’t start with a % character will use the index. This is what makes the materialized path approach so fast.

Implementation of get_ancestors (link):

Match nodes with a path that contains a subset of the current path (steplen is the fixed length of a step).

paths = [
    self.path[0:pos]
    for pos in range(0, len(self.path), self.steplen)[1:]
]
return get_result_class(self.__class__).objects.filter(
    path__in=paths).order_by('depth')
Run Code Online (Sandbox Code Playgroud)

Implementation of get_descendants (link):

Match nodes with a depth greater than self and a path which starts with current path.

return cls.objects.filter(
    path__startswith=parent.path,
    depth__gte=parent.depth
).order_by(
    'path'
)
Run Code Online (Sandbox Code Playgroud)

Potential downsides to this approach:

  1. A deeply nested hierarchy will result in long paths, which can hurt read performance.
  2. Moving a node requires updating the path of all descendants.

Postgres includes the ltree extension which provides a custom GiST index (docs).

I am not clear which benefits ltree provides over django-treebeard's implementation. This article argues that only ltree can answer the get_ancestors question, but as demonstrated earlier, figuring out the ancestors (or descendants) of a node is trivial.

[As an aside, if found this Django ltree library - https://github.com/mariocesar/django-ltree].

Both approaches use an index (django-treebeard uses b-tree, ltree uses a custom GiST). I am interested in understanding the implementation of the ltree GiST and why it might be a more efficient index than a standard b-tree for this particular use case (materialized path).

Additional links

What are the options for storing hierarchical data in a relational database?

https://news.ycombinator.com/item?id=709970

Pin*_*nyM 6

TL;DR可重用标签、复杂的搜索模式和针对多个后代节点(或尚未检索到路径的单个节点)的祖先搜索无法使用物化路径索引来完成。


对于那些对血腥细节感兴趣的人......

首先,只有当您没有在节点描述中重用任何标签时,您的问题才相关。如果是的话,l-tree 确实是两者中唯一的选择。但是物化路径实现通常不需要这个,所以让我们把它放在一边。

一个明显的区别在于 l-tree 为您提供的搜索类型的灵活性。考虑这些示例(来自ltree您问题中链接的文档):

foo         Match the exact label path foo
*.foo.*     Match any label path containing the label foo
*.foo       Match any label path whose last label is foo
Run Code Online (Sandbox Code Playgroud)

第一个查询显然可以通过物化路径实现。最后一个也是可以实现的,您可以将查询调整为同级查找。但是,中间情况不能通过单个索引查找直接实现。您要么必须将其分解为两个查询(所有后代 + 所有祖先),要么求助于表扫描。

然后有像这样的非常复杂的查询(也来自文档):

Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
Run Code Online (Sandbox Code Playgroud)

物化路径索引在这里是无用的,需要进行全表扫描来处理这个问题。如果您想将此作为 SARGable 查询执行,l-tree 是唯一的选择。

但是对于标准的分层操作,找到以下任何一项:

  • 父母
  • 孩子们
  • 后代
  • 根节点
  • 叶节点

物化路径与 l-tree 一样有效。与上面链接文章相反,使用 b 树搜索一个共同祖先的所有后代是非常可行的。查询格式WHERE path LIKE 'A.%'是 SARGable,前提是您的索引准备妥当(我必须明确标记我的路径索引varchar_pattern_ops才能使其正常工作)。

此列表中缺少的是为后代查找所有祖先WHERE 'A.B.C.D' LIKE path || '.%'不幸的是,查询格式不会使用索引。一个解决办法,一些图书馆实现是从路径解析出祖先节点,并直接对它们进行查询:WHERE id IN ('A', 'B', 'C')。但是,这仅在您针对已检索其路径的特定节点的祖先时才有效。l-tree 将在这一点上获胜。