MongoDB阵列查询性能

Question

MongoDB阵列查询性能

Ski*_*010 2 arrays performance mongodb mongodb-query

我正在试图找出像app这样的约会网站最好的架构.用户有一个列表(可能很多),他们可以查看其他用户列表以"喜欢"和"不喜欢"它们.

目前我只是将列出id的其他人存储在a likedBy和dislikedBy数组中.当用户"喜欢"列表时,它会将其列表ID放入"喜欢"列表数组中.但是,我现在想跟踪用户喜欢列表的时间戳.这将用于用户的"历史列表"或用于数据分析.

我需要做两个单独的查询:

find all active listings that this user has not liked or disliked before

以及用户对"喜欢"/"不喜欢"选择的历史记录

find all the listings user X has liked in chronological order

我目前的架构是:

listings
  _id: 'sdf3f'
  likedBy: ['12ac', 'as3vd', 'sadf3']
  dislikedBy: ['asdf', 'sdsdf', 'asdfas']
  active: bool

Run Code Online (Sandbox Code Playgroud)

我可以这样做吗？

listings
  _id: 'sdf3f'
  likedBy: [{'12ac', date: Date}, {'ds3d', date: Date}]
  dislikedBy: [{'s12ac', date: Date}, {'6fs3d', date: Date}]
  active: bool

Run Code Online (Sandbox Code Playgroud)

我也在考虑制作一个新系列choices.

choices
  Id
  userId          // id of current user making the choice
  userlistId      // listing of the user making the choice
  listingChoseId  // the listing they chose yes/no
  type
  date

Run Code Online (Sandbox Code Playgroud)

在做这个时,我不确定在另一个集合中有这些选择的性能影响find all active listings that this user has not liked or disliked before.

任何见解将不胜感激!

Answer 1

Nei*_*unn 17

那么你显然认为将这些嵌入到"列表"文档中是一个好主意,这样你在这里提供的案例的额外使用模式就能正常工作.考虑到这一点,没有理由抛弃它.

为了澄清,你似乎想要的结构是这样的:

{
    "_id": "sdf3f",
    "likedBy": [
         { "userId": "12ac",  "date": ISODate("2014-04-09T07:30:47.091Z") },
         { "userId": "as3vd", "date": ISODate("2014-04-09T07:30:47.091Z") },
         { "userId": "sadf3", "date": ISODate("2014-04-09T07:30:47.091Z") }
    ],
    "dislikedBy": [
        { "userId": "asdf",   "date": ISODate("2014-04-09T07:30:47.091Z") },
        { "userId": "sdsdf",  "date": ISODate("2014-04-09T07:30:47.091Z") },
        { "userId": "asdfas", "date": ISODate("2014-04-09T07:30:47.091Z") }
    ],
    "active": true
}

Run Code Online (Sandbox Code Playgroud)

除了有一个捕获之外,哪个都很好.因为您在两个数组字段中具有此内容,所以您将无法在这两个字段上创建索引.这是一个限制,其中只有一个数组类型的字段(或多键)可以包含在复合索引中.

因此,要解决第一个查询无法使用索引的明显问题,您可以这样构造:

{
    "_id": "sdf3f",
    "votes": [
        { 
            "userId": "12ac",
            "type": "like", 
            "date": ISODate("2014-04-09T07:30:47.091Z")
        },
        {
            "userId": "as3vd",
            "type": "like",
            "date": ISODate("2014-04-09T07:30:47.091Z")
        },
        { 
            "userId": "sadf3", 
            "type": "like", 
            "date": ISODate("2014-04-09T07:30:47.091Z")
        },
        { 
            "userId": "asdf", 
            "type": "dislike",
            "date": ISODate("2014-04-09T07:30:47.091Z")
        },
        {
            "userId": "sdsdf",
            "type": "dislike", 
            "date": ISODate("2014-04-09T07:30:47.091Z")
        },
        { 
            "userId": "asdfas", 
            "type": "dislike",
            "date": ISODate("2014-04-09T07:30:47.091Z")
        }
    ],
    "active": true
}

Run Code Online (Sandbox Code Playgroud)

这允许覆盖此表单的索引:

db.post.ensureIndex({
    "active": 1,
    "votes.userId": 1, 
    "votes.date": 1, 
    "votes.type": 1 
})

Run Code Online (Sandbox Code Playgroud)

实际上,您可能需要一些索引来满足您的使用模式,但现在可以使用可以使用的索引.

涵盖第一种情况,您有这种形式的查询:

db.post.find({ "active": true, "votes.userId": { "$ne": "12ac" } })

Run Code Online (Sandbox Code Playgroud)

考虑到你显然不会为每个用户提供喜欢和不喜欢的选项,这是有道理的.按照该索引的顺序,至少可以使用active来过滤,因为您的否定条件需要扫描其他所有内容.任何结构都无法解决这个问题.

对于另一种情况,您可能希望userId在日期之前位于索引中并作为第一个元素.然后你的查询很简单:

db.post.find({ "votes.userId": "12ac" })
    .sort({ "votes.userId": 1, "votes.date": 1 })

Run Code Online (Sandbox Code Playgroud)

但是你可能想知道你突然失去了一些东西,因为得到"喜欢"和"不喜欢"的数量就像测试阵列的大小一样容易,但现在它有点不同了.不是使用聚合无法解决的问题:

db.post.aggregate([
    { "$unwind": "$votes" },
    { "$group": {
       "_id": {
           "_id": "$_id",
           "active": "$active"
       },
       "likes": { "$sum": { "$cond": [
           { "$eq": [ "$votes.type", "like" ] },
           1,
           0
       ]}},
       "dislikes": { "$sum": { "$cond": [
           { "$eq": [ "$votes.type", "dislike" ] },
           1,
           0
       ]}}
])

Run Code Online (Sandbox Code Playgroud)

因此,无论您的实际使用形式如何,您都可以存储文档的任何重要部分以保留在分组中_id,然后以简单的方式评估"喜欢"和"不喜欢"的数量.

您也可能不会将条目从喜欢变为不喜欢也可以在单个原子更新中完成.

你可以做的更多,但我更喜欢这种结构,原因如下.

归档时间：	11 年，6 月前
查看次数：	2045 次
最近记录：	8 年，4 月前