Avr*_*zky 5 postgresql hierarchy ltree
Materialized Path is a method for representing hierarchy in SQL. Each node contains the path itself and all its ancestors (grandparent/parent/self).
The django-treebeard implementation of MP (docs):
Each step of the path is a fixed length for consistent performance.
Each node contains depth and numchild fields (fast reads at minimal cost to writes).
The path field is indexed (with a standard b-tree index):
The materialized path approach makes heavy use of LIKE in your database, with clauses like WHERE path LIKE '002003%'. If you think that LIKE is too slow, you’re right, but in this case the path field is indexed in the database, and all LIKE clauses that don’t start with a % character will use the index. This is what makes the materialized path approach so fast.
Implementation of get_ancestors (link):
Match nodes with a path that contains a subset of the current path (steplen is the fixed length of a step).
paths = [
self.path[0:pos]
for pos in range(0, len(self.path), self.steplen)[1:]
]
return get_result_class(self.__class__).objects.filter(
path__in=paths).order_by('depth')
Run Code Online (Sandbox Code Playgroud)
Implementation of get_descendants (link):
Match nodes with a depth greater than self and a path which starts with current path.
return cls.objects.filter(
path__startswith=parent.path,
depth__gte=parent.depth
).order_by(
'path'
)
Run Code Online (Sandbox Code Playgroud)
Potential downsides to this approach:
Postgres includes the ltree extension which provides a custom GiST index (docs).
I am not clear which benefits ltree provides over django-treebeard's implementation. This article argues that only ltree can answer the get_ancestors question, but as demonstrated earlier, figuring out the ancestors (or descendants) of a node is trivial.
[As an aside, if found this Django ltree library - https://github.com/mariocesar/django-ltree].
Both approaches use an index (django-treebeard uses b-tree, ltree uses a custom GiST). I am interested in understanding the implementation of the ltree GiST and why it might be a more efficient index than a standard b-tree for this particular use case (materialized path).
Additional links
What are the options for storing hierarchical data in a relational database?
TL;DR可重用标签、复杂的搜索模式和针对多个后代节点(或尚未检索到路径的单个节点)的祖先搜索无法使用物化路径索引来完成。
对于那些对血腥细节感兴趣的人......
首先,只有当您没有在节点描述中重用任何标签时,您的问题才相关。如果是的话,l-tree 确实是两者中唯一的选择。但是物化路径实现通常不需要这个,所以让我们把它放在一边。
一个明显的区别在于 l-tree 为您提供的搜索类型的灵活性。考虑这些示例(来自ltree您问题中链接的文档):
foo Match the exact label path foo
*.foo.* Match any label path containing the label foo
*.foo Match any label path whose last label is foo
Run Code Online (Sandbox Code Playgroud)
第一个查询显然可以通过物化路径实现。最后一个也是可以实现的,您可以将查询调整为同级查找。但是,中间情况不能通过单个索引查找直接实现。您要么必须将其分解为两个查询(所有后代 + 所有祖先),要么求助于表扫描。
然后有像这样的非常复杂的查询(也来自文档):
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
Run Code Online (Sandbox Code Playgroud)
物化路径索引在这里是无用的,需要进行全表扫描来处理这个问题。如果您想将此作为 SARGable 查询执行,l-tree 是唯一的选择。
但是对于标准的分层操作,找到以下任何一项:
物化路径与 l-tree 一样有效。与上面链接的文章相反,使用 b 树搜索一个共同祖先的所有后代是非常可行的。查询格式WHERE path LIKE 'A.%'是 SARGable,前提是您的索引准备妥当(我必须明确标记我的路径索引varchar_pattern_ops才能使其正常工作)。
此列表中缺少的是为后代查找所有祖先。WHERE 'A.B.C.D' LIKE path || '.%'不幸的是,查询格式不会使用索引。一个解决办法,一些图书馆实现是从路径解析出祖先节点,并直接对它们进行查询:WHERE id IN ('A', 'B', 'C')。但是,这仅在您针对已检索其路径的特定节点的祖先时才有效。l-tree 将在这一点上获胜。
| 归档时间: |
|
| 查看次数: |
1700 次 |
| 最近记录: |