Ant*_*nel 5 indexing mongodb nosql mongodb-query
我有以下 Mongodb 数据库结构:
{
"_id" : "519817e508a16b447c00020e",
"keyword" : "Just an example query",
"rankings" :
{
results:
{
"1" : { "domain" : "example1.com", "href" : "http://www.example1.com/"},
"2" : { "domain" : "example2.com", "href" : "http://www.example2.com/"},
"3" : { "domain" : "example3.com", "href" : "http://www.example3.com/"},
"4" : { "domain" : "example4.com", "href" : "http://www.example4.com/"},
"5" : { "domain" : "example5.com", "href" : "http://www.example5.com/"},
...
...
"99" : { "domain" : "example99.com", "href" : "http://www.example99.com/"}
"100" : {"domain" : "example100.com", "href" : "http://www.example100.com/"}
},
"plus":"many",
"other":"not",
"interesting" : "stuff",
"for": "this question"
}
}
Run Code Online (Sandbox Code Playgroud)
在上一个问题中,我询问了如何为文本编制索引,以便我可以使用以下方法搜索关键字和域:
db.ranking.find({ $text: { $search: "\"example9.com\" \"Just an example query\""}})
Run Code Online (Sandbox Code Playgroud)
John Petrone 的精彩回答是:
db.ranking.ensureIndex(
{
"keyword": "text",
"rankings.results.1.domain" : "text",
"rankings.results.2.domain" : "text",
...
...
"rankings.results.99.domain" : "text",
"rankings.results.100.domain" : "text"
}
Run Code Online (Sandbox Code Playgroud)
但是,如果这在我有 10 个结果时效果很好,那么当我尝试索引 100 个结果时,我会遇到来自 Mongo shell 的代码 67 的“索引键模式太大”错误。
所以最大的问题是:
我如何(见鬼)解决“索引键模式太大”错误?
编辑:18/08/2014 文件结构澄清
{
"_id" : "519817e508a16b447c00020e", #From Mongodb
"keyword" : "Just an example query",
"date" : "2014-03-28"
"rankings" :
{
"1" : { "domain" : "example1.com", "href" : "http://www.example1.com/", "plus" : "stuff1"},
...
"100" : {"domain" : "example100.com", "href" : "http://www.example100.com/"plus" : "stuff100"}"}
},
"plus":"many",
"other":"not",
"interesting" : "stuff",
"for": "this question"
}
Run Code Online (Sandbox Code Playgroud)
所以,这就是我的解决方案:\n我决定坚持使用嵌入文档,进行一个过于简单的修改:用包含排名的数组替换包含实际排名的字典键,就是这样:
\n\n{ \n "_id" : "519817e508a16b447c00020e", #From Mongodb\n "keyword" : "Just an example query", \n "date" : "2014-03-28"\n "rankings" :\n [\n { \n "domain" : "example1.com", "href" : "http://www.example1.com/", "plus" : "stuff1", "rank" : 1\n },\n ...\n {\n "domain" : "example100.com", "href" : "http://www.example100.com/"plus" : "stuff100", "rank" : 100\n }\n ]\n "plus":"many", \n "more":"uninteresting", \n "stuff" : "for", \n "this": "question"\n}\nRun Code Online (Sandbox Code Playgroud)\n\n然后,我可以使用以下命令选择整个文档:
\n\n> db.ranking.find({"keyword":"how are you doing", "rank_date" : "2014-08-27\xe2\x80\x9d)\nRun Code Online (Sandbox Code Playgroud)\n\n或者使用投影得到一个结果,这真是太棒了,也是 Mongodb 2.6 中的一个新功能:-D
\n\n> db.collection.find({ "rank_date" : "2014-04-09", "rankings.href": "http://www.example100.com/" }, { "rankings.$": 1 })\n\n [\n { \n "domain" : "example100.com", "href" : "http://www.example100.com/", "plus" : "stuff100", "rank" : 100\n },\n ]\nRun Code Online (Sandbox Code Playgroud)\n\n甚至可以直接获取单个 url 排名:
\n\n> db.collection.find({"rank_date" : "2014-04-09", "rankings.href": "http://www.example5.com/"}, { "rankings.$": 1 })[0][\'rankings\'][0][\'rank\']\n5\nRun Code Online (Sandbox Code Playgroud)\n\n最后,我还根据 url 创建索引:
\n\n> db.collection.ensureIndex( {"rankings.href" : "text"} )\nRun Code Online (Sandbox Code Playgroud)\n\n通过索引,我可以搜索单个 url、部分 url、子域或整个域,这样就很棒了:
\n\n> db.collection.find({ $text: { $search: "example5.com"}})\nRun Code Online (Sandbox Code Playgroud)\n\n确实如此!非常感谢大家的帮助,特别是@JohnBar\xc3\xa7a :-D
\n| 归档时间: |
|
| 查看次数: |
3000 次 |
| 最近记录: |