内存Trie Implementation的高效数据结构

Question

内存Trie Implementation的高效数据结构

div*_*yum 7 python tree nlp trie data-structures

我在python中实现了一个Trie.直到现在我遇到了两种不同的方法来实现它:

在数据成员中使用类Node(类似于C++中的struct Node) -

char - 存储角色

is_end - 存储单词结尾(true或false)

prefix_count - 存储具有当前前缀的单词数

child - 节点类型dict(用于存储其他节点,即26个字母)

class Node(object):
    def __init__(self):
        self.char = ''
        self.word = ''
        self.is_end = False
        self.prefix_count = 0
        self.child = {}

Run Code Online (Sandbox Code Playgroud)

使用字典存储所有数据.

words = {'foo', 'bar', 'baz', 'barz'}
       {'b': {'a': {'r': {'_end_': '_end_', 'z': {'_end_': '_end_'}},
        'z': {'_end_': '_end_'}}},
        'f': {'o': {'o': {'_end_': '_end_'}}}}
Run Code Online (Sandbox Code Playgroud)

哪个是高效且标准的数据结构,对于大数据集单词的遍历和其他trie操作,它既高效又快速？

Answer 1

Jos*_*lls 1

为什么不兼得？就在昨天，我正在实现一个类似的数据结构来存储和检索对象的层次结构，并考虑了这种确切的情况。最终使用带有子字典的 Node 对象。主节点是一个对象，允许您使用自定义方法来打印它或获取内容，如果需要，您甚至可以进行延迟初始化（您提到了大数据集，对吗？）

class Node(object):
    def __init__(self):
        self.children = collections.defaultdict(lambda:self.__class__())
        self.stuff = ...
    def __str__(self,depth=0):
        if self.children:
            return str(self.stuff)+'\n'+'\n'.join([' '*depth+k+' ' + v.__str__(depth+1) for k,v in self.children.items()])
        else:
            return str(self.stuff)
    def get_path(self,path):
        node = self
        while path:
            node = node.children[path.pop(0)]
        ...

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，10 月前
查看次数：	1255 次
最近记录：	9 年，6 月前