MongoDB:嵌套值与单独集合的搜索性能 - 数据库架构设计

Dmi*_*kin 5 database-design mongodb database-schema

假设我有一个 MongoDB,其中有单独textsstatements.

我需要能够搜索texts,其中包含某些关键字statements(还有出现搜索词的多个文本)。

我还需要能够找到statements特定用户添加的所有文本中的所有内容,其中包含特定的搜索短语。

我的问题:我是否需要创建一个单独的集合,statements或者我可以简单地将它们添加为嵌套到texts集合中吗?

因此,选项 1(单独的集合):

文字集


text: {
    name: 'nabokov',
    id: '1'
}
Run Code Online (Sandbox Code Playgroud)

报表集合:

statement: {
    text_id: '1',
    id: '24',
    text: 'He opened the window and saw the sky`
}
Run Code Online (Sandbox Code Playgroud)

选项 2(嵌套):


text: {
    name: 'nabokov',
    id: '1'
    statements: [
        id: '24',
        text: 'He opened the window and saw the sky`
    ]
}

Run Code Online (Sandbox Code Playgroud)

如果我想根据关键字搜索单独检索语句并保留上下文数据(例如它们属于哪个文本等),哪种 MongoDB 存储模式更好

这将如何影响较大数据库(例如 > 100 Gb)的写入/读取速度。

我的文本大小限制为 16 Mb。

ray*_*ray 3

For MongoDB document schema design w.r.t. performance, there are several factors that could be helpful to take into consideration:

  1. What are the cardinalities of the relationships between collections?
  2. What is the expected number/size of documents in a collection?
  3. What are the most frequently used queries?
  4. how often are documents getting updated?

For your scenario, we actually need more context / details from you to work out a more sensible "answer". But here are some common scenarios that I have personally come into before and it might be useful for you as a reference.

  1. text as a root document that is not frequently updated; Most of the queries are based on the statement collection as a child collection.

In this case, it could be a good idea to denormalize the text document and replicating the field name into corresponding statement document. e.g.

statement: {
    text_id: '1',
    text_name: 'nabokov',
    id: '24',
    text: 'He opened the window and saw the sky`
}
Run Code Online (Sandbox Code Playgroud)

In this way, you gain performance boost by avoiding a $lookup to the text collection, while only incurring a small cost of maintaining the new text_name column. The cost is small since the text document is not going to be updated frequently anyway.

  1. a text document will be associated with small number of statements objects/documents only.

In this case, it could be a good idea to go for your option 1 (i.e. keep the statements in an array of text document). The advantage is you can compose rather simple queries and avoid the cost in maintaining another collection of statement.

Here is a very good document to read more about MongoDB schema design.