hle*_*one 15 geospatial mongodb
我有一个包含GeoJSON Point形式坐标数据的集合,我需要从中查询区域内的10个最新条目.现在有1.000.000个条目,但是会有大约10倍的条目.
我的问题是,当所需区域内有大量条目时,我的查询性能会大幅下降(案例3).我当前拥有的测试数据是随机的,但实际数据不是,因此根据区域的尺寸选择另一个索引(如案例4)是不可能的.
无论区域如何,我应该怎么做才能让它以可预测的方式执行?
1.收集统计:
> db.randomcoordinates.stats()
{
"ns" : "test.randomcoordinates",
"count" : 1000000,
"size" : 224000000,
"avgObjSize" : 224,
"storageSize" : 315006976,
"numExtents" : 15,
"nindexes" : 3,
"lastExtentSize" : 84426752,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 120416128,
"indexSizes" : {
"_id_" : 32458720,
"position_2dsphere_timestamp_-1" : 55629504,
"timestamp_-1" : 32327904
},
"ok" : 1
}
Run Code Online (Sandbox Code Playgroud)
2.指数:
> db.randomcoordinates.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.randomcoordinates",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"position" : "2dsphere",
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "position_2dsphere_timestamp_-1"
},
{
"v" : 1,
"key" : {
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "timestamp_-1"
}
]
Run Code Online (Sandbox Code Playgroud)
3.使用2dsphere复合索引查找:
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("position_2dsphere_timestamp_-1").explain()
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 116775,
"nscanned" : 283424,
"nscannedObjectsAllPlans" : 116775,
"nscannedAllPlans" : 283424,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 3876,
"indexBounds" : {
},
"nscanned" : 283424,
"matchTested" : NumberLong(166649),
"geoTested" : NumberLong(166649),
"cellsInCover" : NumberLong(14),
"server" : "chan:27017"
}
Run Code Online (Sandbox Code Playgroud)
4.使用时间戳索引查找:
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1").explain()
{
"cursor" : "BtreeCursor timestamp_-1",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 63,
"nscanned" : 63,
"nscannedObjectsAllPlans" : 63,
"nscannedAllPlans" : 63,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"timestamp" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "chan:27017"
}
Run Code Online (Sandbox Code Playgroud)
有些人建议使用{timestamp: -1, position: "2dsphere"}索引,所以我也试过了,但它似乎表现不佳.
5.使用Timestamp + 2dsphere复合索引查找
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1_position_2dsphere").explain()
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 116953,
"nscanned" : 286513,
"nscannedObjectsAllPlans" : 116953,
"nscannedAllPlans" : 286513,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 4597,
"indexBounds" : {
},
"nscanned" : 286513,
"matchTested" : NumberLong(169560),
"geoTested" : NumberLong(169560),
"cellsInCover" : NumberLong(14),
"server" : "chan:27017"
}
Run Code Online (Sandbox Code Playgroud)
您是否尝试过在数据集上使用聚合框架?
您想要的查询类似于:
db.randomcoordinates.aggregate(
{ $match: {position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}},
{ $sort: { timestamp: -1 } },
{ $limit: 10 }
);
Run Code Online (Sandbox Code Playgroud)
不幸的是,聚合框架尚未explain进入生产版本,因此您只能知道它是否会产生巨大的时间差异。如果您可以从源代码构建,那么看起来它可能在上个月末就已经存在: https: //jira.mongodb.org/browse/SERVER-4504。它看起来也将出现在计划于下周二(2013 年 10 月 15 日)发布的 Dev 版本 2.5.3 中。
| 归档时间: |
|
| 查看次数: |
2982 次 |
| 最近记录: |