在mongo中,如何使用map reduce来获取最近订购的组

Question

在mongo中,如何使用map reduce来获取最近订购的组

Mon*_*key 4 mapreduce mongodb greatest-n-per-group

地图缩小示例我看到使用像count这样的聚合函数,但是使用map reduce来说出每个类别中前三项的最佳方法是什么.

我假设我也可以使用组功能,但很奇怪,因为他们声明分片环境不能使用group().但是,我实际上也有兴趣看到一个group()示例.

Answer 1

为简化起见,我假设你有以下形式的文件:

{category: <int>, score: <int>}

Run Code Online (Sandbox Code Playgroud)

我创建了1000个文档,涵盖100个类别:

for (var i=0; i<1000; i++) {
  db.foo.save({
    category: parseInt(Math.random() * 100),
    score: parseInt(Math.random() * 100)
  });
}

Run Code Online (Sandbox Code Playgroud)

我们的映射器非常简单,只需将类别作为键发出,并将包含分数数组的对象作为值:

mapper = function () {
  emit(this.category, {top:[this.score]});
}

Run Code Online (Sandbox Code Playgroud)

MongoDB的减速器不能返回一个数组,并且减速器的输出必须是相同的类型的值,我们的emit,所以我们必须在一个对象包裹.我们需要一系列分数,因为这将让我们的减速器计算前3个分数:

reducer = function (key, values) {
  var scores = [];
  values.forEach(
    function (obj) {
      obj.top.forEach(
        function (score) {
          scores[scores.length] = score;
      });
  });
  scores.sort();
  scores.reverse();
  return {top:scores.slice(0, 3)};
}

Run Code Online (Sandbox Code Playgroud)

最后,调用map-reduce:

db.foo.mapReduce(mapper, reducer, "top_foos");

Run Code Online (Sandbox Code Playgroud)

现在我们有一个集合,每个类别包含一个文档,以及该类别中所有文档的前3个分数foo:

{ "_id" : 0, "value" : { "top" : [ 93, 89, 86 ] } }
{ "_id" : 1, "value" : { "top" : [ 82, 65, 6 ] } }

Run Code Online (Sandbox Code Playgroud)

(如果使用与上面相同的Math.random()数据生成器,您的确切值可能会有所不同)

您现在可以使用它来查询foo具有最高分数的实际文档:

function find_top_scores(categories) {
  var query = [];
  db.top_foos.find({_id:{$in:categories}}).forEach(
    function (topscores) {
      query[query.length] = {
        category:topscores._id,
        score:{$in:topscores.value.top}
      };
  });
  return db.foo.find({$or:query});

Run Code Online (Sandbox Code Playgroud)

}

此代码不会处理关系,或者更确切地说,如果存在关系,则可能会在生成的最终游标中返回3个以上的文档find_top_scores.

使用的解决方案group有点类似,尽管reducer一次只需要考虑两个文档,而不是密钥的分数数组.

归档时间：	14 年，1 月前
查看次数：	1573 次
最近记录：	14 年，1 月前