如何在mongo中优化按日期查询查找

Question

如何在mongo中优化按日期查询查找

Dew*_*rld 1 python query-optimization mongodb nosql pymongo

我有一个包含 0.6 百万个文档的集合。大多数文件的结构如下，

{
    "_id" : ObjectId("53d86ef920ba274d5e4c8683"),
    "checksum" : "2856caa9490e5c92aedde91330964488",
    "content" : "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\r\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"bn-bd\" lang=\"bn-bd\" dir=\"ltr\" " />\n  <link rel=\"stylesheet\" href=\"/templates/beez_20/css/position.css\" type=\"text/css\" media=\"screen,projection\ef=\"/index.php/bn/contact-bangla/2013-0</body>\r\n</html>",
    "date" : ISODate("2014-07-29T15:57:11.886Z"),
    "filtered_content" : "",
    "indexed" : true,
    "category": 'raw',
    "link_extracted" : 1,
    "parsed" : true,
    "title" : "Constituency 249_10th_En",
    "url" : "http://www.somesite.com.bd/index.php/bn/bangla/2014-03-23-11-45-04?layout=edit&id=2143"
}

Run Code Online (Sandbox Code Playgroud)

所有文档都带有日期属性。现在，当我编写下面的查询时，我得到了无限期的延迟时间来显示结果。

from pymongo import Connection
import datetime

con = Connection()
db = con.spider
pages = db.pages

today = datetime.datetime.combine( datetime.date.today(), datetime.datetime.min.time() )

c = pages.find({ u'category': 'news', u'date': {u'$gt': today } }, {u'title': 1, '_id': 0} )

for item in c:
    print item

Run Code Online (Sandbox Code Playgroud)

索引是，

_id, url, parsed

Run Code Online (Sandbox Code Playgroud)

如何将这个查询的性能限制在可接受的时间范围内？任何可靠的答案，建议表示赞赏！

Answer 1

hug*_*own 5

看起来像添加索引category并且date会有所帮助。

pages.createIndex({'date': 1, 'category': 1});

Run Code Online (Sandbox Code Playgroud)

在 pymongo 中，索引创建看起来更像这样：

keys = [
    ("date", pymongo.ASCENDING),
    ("category", pymongo.ASCENDING)
]
pages.create_index(keys)

Run Code Online (Sandbox Code Playgroud)

您最可能感兴趣的选项是：

name: custom name to use for this index - if none is given, a name will be generated
unique: if True creates a unique constraint on the index

Run Code Online (Sandbox Code Playgroud)

不过，我不希望日期/类别是唯一的。为索引命名似乎是一个好习惯。

@Dewsworld MongoDB 总是尽可能多地占用内存。因此，当您因为数据库服务器已使用其所有 RAM 而感到恐慌时，请放心，无论您做什么，情况都是如此。这是设计使然。 (2认同)

归档时间：	11 年，6 月前
查看次数：	2413 次
最近记录：	8 年，8 月前