如何避免许多数据库往返和大量无关数据？

Question

如何避免许多数据库往返和大量无关数据？

dpp*_*dpp 7 sql language-agnostic optimization data-retrieval

我已经使用过各种应用程序并且多次遇到过这种情况.到目前为止,我还没有想出什么是最好的方法.

这是场景:

我有桌面或网络应用程序
我需要从数据库中检索简单的文档.该文档包含一般细节和项目详细信息,因此数据库:

GeneralDetails 表:

| DocumentID | DateCreated | Owner     |
| 1          | 07/07/07    | Naruto    |
| 2          | 08/08/08    | Goku      |
| 3          | 09/09/09    | Taguro    |

Run Code Online (Sandbox Code Playgroud)

ItemDetails 表

| DocumentID | Item        | Quantity  |
| 1          | Marbles     | 20        |
| 1          | Cards       | 56        |
| 2          | Yo-yo       | 1         |
| 2          | Chess board | 3         |
| 2          | GI Joe      | 12        |
| 3          | Rubber Duck | 1         |

Run Code Online (Sandbox Code Playgroud)

如您所见,表格具有一对多的关系.现在,为了检索所有文件及其各自的项目,我总是做两个中的任何一个:

方法1 - 许多往返(伪代码):

 Documents = GetFromDB("select DocumentID, Owner " +
                       "from GeneralDetails") 
 For Each Document in Documents
{
    Display(Document["CreatedBy"])
    DocumentItems = GetFromDB("select Item, Quantity " + 
                              "from ItemDetails " + 
                              "where DocumentID = " + Document["DocumentID"] + "")
    For Each DocumentItem in DocumentItems
    {
        Display(DocumentItem["Item"] + " " + DocumentItem["Quantity"])
    }
}

Run Code Online (Sandbox Code Playgroud)

方法2 - 许多无关数据(伪代码):

DocumentsAndItems = GetFromDB("select g.DocumentID, g.Owner, i.Item, i.Quantity " + 
                              "from GeneralDetails as g " +
                              "inner join ItemDetails as i " +
                              "on g.DocumentID = i.DocumentID")
//Display...

Run Code Online (Sandbox Code Playgroud)

我在大学期间使用第一种方法进行桌面应用时,性能并不差,所以我意识到它没问题.

直到有一天,我看到一篇文章"让网络变得更快",它说许多往返数据库的往返都很糟糕; 所以从那时起我就使用了第二种方法.

在第二种方法中,我通过使用内部联接来一次检索第一个和第二个表来避免往返,但它会产生不必要的或冗余的数据.查看结果集.

| DocumentID | Owner     | Item        | Quantity  |
| 1          | Naruto    | Marbles     | 20        |
| 1          | Naruto    | Cards       | 56        |
| 2          | Goku      | Yo-yo       | 1         |
| 2          | Goku      | Chess board | 3         |
| 2          | Goku      | GI Joe      | 12        |
| 3          | Taguro    | Rubber Duck | 1         |

Run Code Online (Sandbox Code Playgroud)

结果集有冗余DocumentID和Owner.它看起来像一个非标准化的数据库.

现在,问题是,如何避免往返,同时避免冗余数据？

Answer 1

Ste*_*Mai 5

ActiveRecord 和其他 ORM 使用的方法是选择第一个表，将 ID 批处理在一起，然后在 IN 子句中使用这些 ID 进行第二次选择。

SELECT * FROM ItemDetails WHERE DocumentId IN ( [此处以逗号分隔的 ID 列表] )

优点：

无冗余数据

缺点：

两次查询

一般来说，第一种方法被称为“N+1查询问题”，解决方案被称为“渴望加载”。我倾向于认为您的“方法 2”更可取，因为数据库的延迟通常胜过数据传输速率上的冗余数据的大小，但 YRMV。与软件中的几乎所有事物一样，这是一种权衡。

归档时间：	14 年，5 月前
查看次数：	2565 次
最近记录：	9 年，7 月前