MongoDB 文本索引多词搜索太慢

Question

MongoDB 文本索引多词搜索太慢

问题描述

MongoDB 版本：3.4.4

MongoDB 集合中的文档是从 XML 文件（不是 GridFS）创建的，如下所示：

{
    ...
    "????????" : {
        "@attributes" : {
            "??????????" : "???????? ? ???????????? ???????????????? \"?????????????? ???????? \"?????? ???????\"",
            ...
        },
        ...
    }
    ...
}

Run Code Online (Sandbox Code Playgroud)

语言是俄语。集合有大约10,000,000 个文档和字段“??????????.@attributes.????????????”上的文本索引。

按一个词搜索非常快：

db.records.find({
    $text: {
        $search: "??????"
    }
})

Run Code Online (Sandbox Code Playgroud)

但是用逻辑 AND 搜索几个词太慢了，我什至不能等到它结束才能得到explain('executionStats')结果。

例如下一个查询很慢。查找所有包含单词“??????”的文档和 ”？？？？？？？”：

db.records.find({
    $text: {
        $search: "\"??????\" \"???????\""
    }
})

Run Code Online (Sandbox Code Playgroud)

按词组搜索也很慢。例如，查找包含短语“?????? ?????????”的所有文档：

db.records.find({
    $text: {
        $search: "\"?????? ???????\""
    }
})

Run Code Online (Sandbox Code Playgroud)

getIndexes() 输出：

[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "egrul.records"
        },

        ...

        {
                "v" : 2,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "name" : "????????.@attributes.??????????_text",
                "ns" : "egrul.records",
                "default_language" : "russian",
                "weights" : {
                        "????????.@attributes.??????????" : 1
                },
                "language_override" : "language",
                "textIndexVersion" : 3
        }
]

Run Code Online (Sandbox Code Playgroud)

题

我可以以某种方式提高按几个单词搜索（使用逻辑与）或按短语搜索的速度吗？

已编辑

刚刚发现用逻辑 OR 搜索多个单词也很慢：

db.records.find({
    $text: {
        $search: "?????? ???????"
    }
})

Run Code Online (Sandbox Code Playgroud)

Answer 1

per*_*nik 2

看起来问题不在于多个单词搜索速度慢，而在于搜索词出现在许多文档中时搜索速度慢。

\n\n

例如单词“\xd0\x9c\xd0\x98\xd0\xa6\xd0\xa3\xd0\x91\xd0\x98\xd0\xa1\xd0\x98”仅出现在 24 个（来自 10,000,000 个）文档中，因此查询

\n\n

db.records.find({\n    $text: {\n        $search: "\xd0\x9c\xd0\x98\xd0\xa6\xd0\xa3\xd0\x91\xd0\x98\xd0\xa1\xd0\x98"\n    }\n}).count()\n

Run Code Online (Sandbox Code Playgroud)\n\n

速度非常快。

\n\n

但是单词“\xd0\xa1\xd0\x95\xd0\xa0\xd0\x92\xd0\x98\xd0\xa1”出现在160,000个文档中，并且查询

\n\n

db.records.find({\n    $text: {\n        $search: "\xd0\xa1\xd0\x95\xd0\xa0\xd0\x92\xd0\x98\xd0\xa1"\n    }\n}).count()\n

Run Code Online (Sandbox Code Playgroud)\n\n

速度很慢（大约需要40分钟）。

\n\n

询问

\n\n

db.records.find({\n    $text: {\n        $search: "\\"\xd0\x9c\xd0\x98\xd0\xa6\xd0\xa3\xd0\x91\xd0\x98\xd0\xa1\xd0\x98\\" \\"\xd0\xa1\xd0\x95\xd0\xa0\xd0\x92\xd0\x98\xd0\xa1\\""\n    }\n}).count()\n

Run Code Online (Sandbox Code Playgroud)\n\n

也很慢，因为（我想）MongoDB 查找术语“\xd0\x9c\xd0\x98\xd0\xa6\xd0\xa3\xd0\x91\xd0\x98\xd0\xa1\xd0\x98”（快）并且“\xd0\xa1\xd0\x95\xd0\xa0\xd0\x92\xd0\x98\xd0\xa1”（慢）然后进行交叉或其他操作。

\n\n

现在我想找到一种方法来限制结果的数量，例如find 10 documents and stop因为limit()不适用于文本查询。。

\n\n

或者升级我的服务器硬件。

\n\n

或者看看 Elasticsearch。

\n

归档时间：	8 年，8 月前
查看次数：	1020 次
最近记录：	6 年，1 月前