Mongodb - 多文本索引:索引键模式太大错误代码 67

Ant*_*nel 5 indexing mongodb nosql mongodb-query

我有以下 Mongodb 数据库结构:

{ 
    "_id" : "519817e508a16b447c00020e", 
    "keyword" : "Just an example query", 
    "rankings" : 
    {
        results:
        {
            "1" : { "domain" : "example1.com", "href" : "http://www.example1.com/"},
            "2" : { "domain" : "example2.com", "href" : "http://www.example2.com/"},
            "3" : { "domain" : "example3.com", "href" : "http://www.example3.com/"},
            "4" : { "domain" : "example4.com", "href" : "http://www.example4.com/"},
            "5" : { "domain" : "example5.com", "href" : "http://www.example5.com/"},
            ...
            ...
            "99" : { "domain" : "example99.com", "href" : "http://www.example99.com/"}
            "100" : {"domain" : "example100.com", "href" : "http://www.example100.com/"}
        }, 
        "plus":"many", 
        "other":"not", 
        "interesting" : "stuff", 
        "for": "this question"
    }
}
Run Code Online (Sandbox Code Playgroud)

在上一个问题中,我询问了如何为文本编制索引,以便我可以使用以下方法搜索关键字和域:

db.ranking.find({ $text: { $search: "\"example9.com\" \"Just an example query\""}})  
Run Code Online (Sandbox Code Playgroud)

John Petrone 的精彩回答是:

db.ranking.ensureIndex(
{
    "keyword": "text",
    "rankings.results.1.domain" : "text",
    "rankings.results.2.domain" : "text",
    ...
    ...
    "rankings.results.99.domain" : "text",
    "rankings.results.100.domain" : "text"
}
Run Code Online (Sandbox Code Playgroud)

但是,如果这在我有 10 个结果时效果很好,那么当我尝试索引 100 个结果时,我会遇到来自 Mongo shell 的代码 67 的“索引键模式太大”错误。

所以最大的问题是:

我如何(见鬼)解决“索引键模式太大”错误?


编辑:18/08/2014 文件结构澄清

{ 
    "_id" : "519817e508a16b447c00020e", #From Mongodb
    "keyword" : "Just an example query", 
    "date" : "2014-03-28"
    "rankings" :
    {
            "1" : { "domain" : "example1.com", "href" : "http://www.example1.com/", "plus" : "stuff1"},
            ...
            "100" : {"domain" : "example100.com", "href" : "http://www.example100.com/"plus" : "stuff100"}"}
    }, 
    "plus":"many", 
    "other":"not", 
    "interesting" : "stuff", 
    "for": "this question"
}
Run Code Online (Sandbox Code Playgroud)

Ant*_*nel 1

所以,这就是我的解决方案:\n我决定坚持使用嵌入文档,进行一个过于简单的修改:用包含排名的数组替换包含实际排名的字典键,就是这样:

\n\n
{ \n  "_id" : "519817e508a16b447c00020e", #From Mongodb\n  "keyword" : "Just an example query", \n  "date" : "2014-03-28"\n  "rankings" :\n  [\n    { \n      "domain" : "example1.com", "href" : "http://www.example1.com/", "plus" : "stuff1", "rank" : 1\n    },\n    ...\n    {\n      "domain" : "example100.com", "href" : "http://www.example100.com/"plus" : "stuff100", "rank" : 100\n    }\n  ]\n  "plus":"many", \n  "more":"uninteresting", \n  "stuff" : "for", \n  "this": "question"\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

然后,我可以使用以下命令选择整个文档:

\n\n
> db.ranking.find({"keyword":"how are you doing", "rank_date" : "2014-08-27\xe2\x80\x9d)\n
Run Code Online (Sandbox Code Playgroud)\n\n

或者使用投影得到一个结果,这真是太棒了,也是 Mongodb 2.6 中的一个新功能:-D

\n\n
> db.collection.find({ "rank_date" : "2014-04-09", "rankings.href": "http://www.example100.com/" }, { "rankings.$": 1 })\n\n  [\n    { \n      "domain" : "example100.com", "href" : "http://www.example100.com/", "plus" : "stuff100", "rank" : 100\n    },\n  ]\n
Run Code Online (Sandbox Code Playgroud)\n\n

甚至可以直接获取单个 url 排名:

\n\n
> db.collection.find({"rank_date" : "2014-04-09", "rankings.href": "http://www.example5.com/"}, { "rankings.$": 1 })[0][\'rankings\'][0][\'rank\']\n5\n
Run Code Online (Sandbox Code Playgroud)\n\n

最后,我还根据 url 创建索引:

\n\n
> db.collection.ensureIndex( {"rankings.href" : "text"} )\n
Run Code Online (Sandbox Code Playgroud)\n\n

通过索引,我可以搜索单个 url、部分 url、子域或整个域,这样就很棒了:

\n\n
> db.collection.find({ $text: { $search: "example5.com"}})\n
Run Code Online (Sandbox Code Playgroud)\n\n

确实如此!非常感谢大家的帮助,特别是@JohnBar\xc3\xa7a :-D

\n