使用$ lookup运算符的多个连接条件

Question

使用$ lookup运算符的多个连接条件

use*_*078 19 mongodb mongodb-query aggregation-framework

这是我的收藏:

collection1:

{
    user1: 1,
    user2: 2,
    percent: 0.56
}

Run Code Online (Sandbox Code Playgroud)

collection2:

{
    user1: 1,
    user2: 2,
    percent: 0.3
}

Run Code Online (Sandbox Code Playgroud)

我希望通过'user1'和'user2'加入这两个集合.

结果如下:

{
    user1: 1,
    user2: 2,
    percent1: 0.56,
    percent2: 0.3
}

Run Code Online (Sandbox Code Playgroud)

我该如何编写管道？

Answer 1

sty*_*ane 49

我们可以$lookup在版本3.6及更高版本中使用聚合管道运算符执行多个连接条件.

我们需要使用let可选字段将字段的值分配给变量; 然后pipeline,您可以在指定要在集合上运行的管道的字段阶段中访问这些变量.

请注意,在$match阶段中,我们使用$expr评估查询运算符来比较字段的值.

管道的最后一个阶段是$replaceRoot聚合管道阶段,我们只需使用运算符将$lookup结果与$$ROOT文档的一部分合并$mergeObjects.

db.collection2.aggregate([
       {
          $lookup: {
             from: "collection1",
             let: {
                firstUser: "$user1",
                secondUser: "$user2"
             },
             pipeline: [
                {
                   $match: {
                      $expr: {
                         $and: [
                            {
                               $eq: [
                                  "$user1",
                                  "$$firstUser"
                               ]
                            },
                            {
                               $eq: [
                                  "$user2",
                                  "$$secondUser"
                               ]
                            }
                         ]
                      }
                   }
                }
             ],
             as: "result"
          }
       },
       {
          $replaceRoot: {
             newRoot: {
                $mergeObjects:[
                   {
                      $arrayElemAt: [
                         "$result",
                         0
                      ]
                   },
                   {
                      percent1: "$$ROOT.percent1"
                   }
                ]
             }
          }
       }
    ]
)

Run Code Online (Sandbox Code Playgroud)

此管道产生如下所示的内容:

{
    "_id" : ObjectId("59e1ad7d36f42d8960c06022"),
    "user1" : 1,
    "user2" : 2,
    "percent" : 0.3,
    "percent1" : 0.56
}

Run Code Online (Sandbox Code Playgroud)

如果你不是版本3.6+,你可以先使用你的一个字段加入"user1",然后从那里使用$unwind聚合管道运算符展开匹配文档的数组.管道中的下一个阶段是$redact使用$$KEEP和$$PRUNE系统变量过滤掉"加入"集合中的"user2"值和输入文档不相等的文档的阶段.然后,您可以在$project舞台上重塑您的文档.

db.collection1.aggregate([
    { "$lookup": { 
        "from": "collection2", 
        "localField": "user1", 
        "foreignField": "user1", 
        "as": "collection2_doc"
    }}, 
    { "$unwind": "$collection2_doc" },
    { "$redact": { 
        "$cond": [
            { "$eq": [ "$user2", "$collection2_doc.user2" ] }, 
            "$$KEEP", 
            "$$PRUNE"
        ]
    }}, 
    { "$project": { 
        "user1": 1, 
        "user2": 1, 
        "percent1": "$percent", 
        "percent2": "$collection2_doc.percent"
    }}
])

Run Code Online (Sandbox Code Playgroud)

产生:

{
    "_id" : ObjectId("572daa87cc52a841bb292beb"),
    "user1" : 1,
    "user2" : 2,
    "percent1" : 0.56,
    "percent2" : 0.3
}

Run Code Online (Sandbox Code Playgroud)

如果集合中的文档具有相同的结构,并且您发现自己经常执行此操作,那么您应该考虑将两个集合合并为一个集合,或者将这些集合中的文档插入到新集合中.

db.collection3.insertMany(
    db.collection1.find({}, {"_id": 0})
    .toArray()
    .concat(db.collection2.find({}, {"_id": 0}).toArray())
)

Run Code Online (Sandbox Code Playgroud)

然后$group你的文件由"user1"和"user2"

db.collection3.aggregate([
    { "$group": {
        "_id": { "user1": "$user1", "user2": "$user2" }, 
        "percent": { "$push": "$percent" }
    }}
])

Run Code Online (Sandbox Code Playgroud)

产量:

{ "_id" : { "user1" : 1, "user2" : 2 }, "percent" : [ 0.56, 0.3 ] }

Run Code Online (Sandbox Code Playgroud)

看着这个让我比Mongo更欣赏SQL方式. (9认同)

Answer 2

And*_*sin 5

如果您正在尝试对数据建模，并在决定这样做之前来这里检查 mongodb 是否可以对多个字段执行连接，请继续阅读。

虽然 MongoDB 可以执行连接，但您也可以根据应用程序访问模式自由地对数据建模。如果数据像问题中呈现的一样简单，我们可以简单地维护一个如下所示的集合：

{
    user1: 1,
    user2: 2,
    percent1: 0.56,
    percent2: 0.3
}

Run Code Online (Sandbox Code Playgroud)

现在，您可以在此集合上执行通过加入执行的所有操作。为什么我们要避免连接？因为它们不受分片集合 ( docs ) 的支持，这将阻止您在需要时扩展。规范化数据（具有单独的表/集合）在 SQL 中非常有效，但是当涉及到 Mongo 时，在大多数情况下避免连接可以提供优势而不会产生任何后果。仅当您别无选择时才在 MongoDB 中使用规范化。从文档：

一般来说，使用规范化的数据模型：

当嵌入会导致数据重复但不会提供足够的读取性能优势来抵消重复的影响时。

来表示更复杂的多对多关系。

对大型分层数据集进行建模。

点击此处阅读更多关于嵌入以及为什么选择它而不是标准化的信息。

Answer 3

Xav*_*hot 5

开始Mongo 4.4，我们可以通过新的$unionWith聚合阶段加上经典$group阶段来实现这种类型的“连接” ：

// > db.collection1.find()
//   { "user1" : 1, "user2" : 2, "percent" : 0.56 }
//   { "user1" : 4, "user2" : 3, "percent" : 0.14 }
// > db.collection2.find()
//   { "user1" : 1, "user2" : 2, "percent" : 0.3  }
//   { "user1" : 2, "user2" : 3, "percent" : 0.25 }
db.collection1.aggregate([
  { $set: { percent1: "$percent" } },
  { $unionWith: {
    coll: "collection2",
    pipeline: [{ $set: { percent2: "$percent" } }]
  }},
  { $group: {
    _id: { user1: "$user1", user2: "$user2" },
    percents: { $mergeObjects: { percent1: "$percent1", percent2: "$percent2" } }
  }}
])
// { _id: { user1: 1, user2: 2 }, percents: { percent1: 0.56, percent2: 0.3 } }
// { _id: { user1: 2, user2: 3 }, percents: { percent2: 0.25 } }
// { _id: { user1: 4, user2: 3 }, percents: { percent1: 0.14 } }

Run Code Online (Sandbox Code Playgroud)

这个：

从通过新$unionWith阶段将两个集合并入管道开始：
- 我们首先percent从collection1to重命名percent1（使用$set阶段）
- 在$unionWith阶段内，我们在pipeline上指定 acollection2以便也将percent这次重命名为percent2。
- 这样，我们就可以区分百分比字段的来源。
继续一个$group阶段：
- 分组记录基于user1和user2
- 通过$mergeObjects操作累积百分比。使用$first: "$percent1"and$first: "$percent2"将不起作用，因为这可能会null首先（对于来自其他集合的元素）。而$mergeObjects丢弃null值。

如果需要不同的输出格式，可以添加下游$project阶段。

归档时间：	10 年前
查看次数：	44842 次
最近记录：	8 年，1 月前