快速构建trie

Question

快速构建trie

Bru*_*uce 21 algorithm optimization performance trie data-structures

我正在制作一个需要数千个快速字符串查找和前缀检查的移动应用程序.为了加快速度,我从单词列表中创建了一个Trie,它有大约180,000个单词.

一切都很棒,但唯一的问题是,构建这个巨大的trie(它有大约400,000个节点)目前在我的手机上大约需要10秒,这真的很慢.

这是构建trie的代码.

public SimpleTrie makeTrie(String file) throws Exception {
    String line;
    SimpleTrie trie = new SimpleTrie();

    BufferedReader br = new BufferedReader(new FileReader(file));
    while( (line = br.readLine()) != null) {
        trie.insert(line);
    }
    br.close();

    return trie;
}

Run Code Online (Sandbox Code Playgroud)

insert运行的方法O(length of key)

public void insert(String key) {
    TrieNode crawler = root;
    for(int level=0 ; level < key.length() ; level++) {
        int index = key.charAt(level) - 'A';
        if(crawler.children[index] == null) {
            crawler.children[index] = getNode();
        }
        crawler = crawler.children[index];
    }
    crawler.valid = true;
}

Run Code Online (Sandbox Code Playgroud)

我正在寻找直观的方法来更快地构建trie.也许我只在笔记本电脑上构建一次trie,以某种方式将其存储到磁盘上,然后从手机中的文件中加载它？但我不知道如何实现这一点.

或者是否有任何其他前缀数据结构将花费更少的时间来构建,但具有类似的查找时间复杂度？

任何建议表示赞赏.提前致谢.

编辑

有人建议使用Java Serialization.我试过了,但是这段代码很慢:

public void serializeTrie(SimpleTrie trie, String file) {
        try {
            ObjectOutput out = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
            out.writeObject(trie);
            out.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public SimpleTrie deserializeTrie(String file) {
        try {
            ObjectInput in = new ObjectInputStream(new BufferedInputStream(new FileInputStream(file)));
            SimpleTrie trie = (SimpleTrie)in.readObject();
            in.close();
            return trie;
        } catch (IOException | ClassNotFoundException e) {
            e.printStackTrace();
            return null;
        }
    }

Run Code Online (Sandbox Code Playgroud)

以上代码可以加快速度吗？

我的特里:http://pastebin.com/QkFisi09

单词列表:http://www.isc.ro/lists/twl06.zip

Android IDE用于运行代码:http://play.google.com/store/apps/details？id = com.jimmychen.app.sand

Answer 1

Mik*_*bov 24

由于所有数据都存储在线性数组中,因此双数组尝试的保存/加载速度非常快.它们的查找速度也非常快,但插入的成本可能很高.我打赌在某处有一个Java实现.

此外,如果您的数据是静态的(即您不在手机上更新),请考虑DAFSA来完成您的任务.它是存储单词最有效的数据结构之一(必须优于"标准"尝试和基数尝试的大小和速度,优于简洁的速度尝试,通常比简洁的大小尝试更好).有一个很好的C++实现:dawgdic - 您可以使用它从命令行构建DAFSA,然后使用Java reader获取结果数据结构(此处的示例实现).

归档时间：	12 年，1 月前
查看次数：	9083 次
最近记录：	7 年，8 月前