我可以让SQLAlchemy做子查询急切加载而不重复完整的原始查询吗?

Mih*_*hin 9 python orm sqlalchemy eager-loading

假设我们有原始生成的查询:

SELECT company.x AS company_x, ...
FROM company
LEFT OUTER JOIN acc ON acc.id = company.acc
LEFT OUTER JOIN usercomp_links ON company.id = usercomp_links.pid
LEFT OUTER JOIN usergro_links ON acc.id = usergro_links.pid
WHERE usergro_links.eid = %s OR usercomp_links.eid = %s
Run Code Online (Sandbox Code Playgroud)

如果我们加上.options(subqueryload(Company.childs))这个,我们将得到:

SELECT company.x AS company_x, ..., anon_1.company_id AS anon_1_company_id
FROM (
    SELECT company.id AS company_id
    FROM company
    LEFT OUTER JOIN acc ON acc.id = company.acc
    LEFT OUTER JOIN usercomp_links ON company.id = usercomp_links.pid
    LEFT OUTER JOIN usergro_links ON acc.id = usergro_links.pid
    WHERE usergro_links.eid = %s OR usercomp_links.eid = %s) AS anon_1
INNER JOIN acel_links AS acel_links_1 ON anon_1.company_id = acel_links_1.eid
INNER JOIN company ON company.id = acel_links_1.pid ORDER BY anon_1.company_id
Run Code Online (Sandbox Code Playgroud)

这是懒散的.如果我从第一次查询获得公司ID,并且手动加载所有子公司,那么与我们在这种情况下得到的相比,它将非常快.

我已阅读文档,查看代码,但不知道我是否可以告诉sqlalchemy只是从第一个查询的结果获取ID并在单独的,相对简单的查询中加载子项.我不依赖于这个样本 - 当sqlalchemy无法加载构造的查询时,我有更难的情况.为什么要再次从第一次查询中完成所有这些工作呢?

所以任何人都知道如何在没有自动构建"加入加入"风格的情况下急切加载?

Bor*_*rov 6

更新: “选择进入”策略现在已在SQLAlchemy中实现(从v 1.2开始):请参阅文档中的选择输入

TLDR:

我认为该joinedload策略应尽可能使用,因为它比其他策略更有效,包括问题策略中建议的使用“ IN”语句加载相关数据的策略。

“ IN”策略可以很容易地在SQLAlchemy的“外部”实现(请参见下面的代码),并且将其实现为新的加载策略可能并不复杂(从逻辑上讲,它类似于现有subqueryload策略)。

完整版本:

我从一个简单的实验开始,以查看不同策略产生的查询

实验的完整源代码在Github上

我的模型是这样的:

class Author(ModelBase):
    __tablename__ = 'authors'
    id = Column(Integer, primary_key=True, nullable=False)
    name = Column(String(255))


class Book(ModelBase):
    __tablename__ = 'books'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    author_id = Column(Integer, ForeignKey('authors.id'))
    author = relationship(
        'Author', backref=backref('books'))
Run Code Online (Sandbox Code Playgroud)

现在,测试首先是延迟加载:

books = session.query(Book).all()
print books[0].author.name
session.commit()
Run Code Online (Sandbox Code Playgroud)

输出(清理):

-------------Lazy--------------
sqlalchemy.engine.base.Engine:
SELECT
  books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books

SELECT
  authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id = ?
INFO:sqlalchemy.engine.base.Engine:(1,)
author1
Run Code Online (Sandbox Code Playgroud)

正如预期的那样,每次我们访问作者时,惰性加载都会运行一个查询以获取图书,并运行一个查询。

子查询加载:

books = session.query(Book).options(subqueryload(Book.author)).all()
print books[0].author.name
session.commit()

-------------Subquery----------
SELECT
  books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books

SELECT
  authors.id AS authors_id, authors.name AS authors_name,
  anon_1.books_author_id AS anon_1_books_author_id
FROM (
  SELECT DISTINCT books.author_id AS books_author_id
  FROM books) AS anon_1
JOIN authors
  ON authors.id = anon_1.books_author_id
ORDER BY anon_1.books_author_id
author1
Run Code Online (Sandbox Code Playgroud)

对于子查询,我们有两个查询,第一个查询使用子查询获取书籍,另一个查询使用作者。

已加入载入:

books = session.query(Book).options(joinedload(Book.author)).all()
print books[0].author.name
session.commit()

-------------Joined------------
SELECT
  books.id AS books_id, books.name AS books_name,
  books.author_id AS books_author_id,
  authors_1.id AS authors_1_id, authors_1.name AS authors_1_name
FROM books
LEFT OUTER JOIN authors AS authors_1 ON authors_1.id = books.author_id
author1
Run Code Online (Sandbox Code Playgroud)

联合策略仅运行一个查询即可获取书籍和作者。

立即加载:

books = session.query(Book).options(immediateload(Book.author)).all()
print books[0].author.name
session.commit()

-------------Immediate---------
SELECT
   books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books

SELECT
  authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id = ?
INFO:sqlalchemy.engine.base.Engine:(1,)

SELECT authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id = ?
INFO:sqlalchemy.engine.base.Engine:(2,)

author1
Run Code Online (Sandbox Code Playgroud)

并且该immediate策略使用第一个查询加载书籍,然后,当我们尝试访问该关系时,使用每个相关记录的单独查询来获取所有相关数据。

看起来“ joinedload()”在大多数情况下应该是最高效的(比“ IN”策略效率更高)-我们仅通过单个查询即可获取所有数据。

现在,让我们尝试在SQL炼金术之外实现IN策略:

print '-------------IN----------------'
books = session.query(Book).all()
ids = set()
for b in books:
    ids.add(b.author_id)
authors = session.query(Author).filter(Author.id.in_(ids)).all()
print books[0].author.name
print books[1].author.name
print books[2].author.name
print books[3].author.name
Run Code Online (Sandbox Code Playgroud)

输出:

-------------IN----------------
SELECT
  books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books

SELECT authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id IN (?, ?)
INFO:sqlalchemy.engine.base.Engine:(1, 2)

author1
author1
author2
author2
Run Code Online (Sandbox Code Playgroud)

如我们所见,它运行两个查询,然后我们可以访问所有作者。

请注意,我们没有将作者明确加入书中,但是当我们尝试通过书访问作者时,它仍然有效,因为SQLAlchemy在内部身份映射中查找作者记录,并且不会运行其他数据库查询。

可以将与上述类似的“ IN”策略代码概括为可与任何模型/关系一起使用的功能。可能,“ IN”策略应该相对容易地实现为新的SQLAlchemy策略,它与现有策略类似subqueryloading-还应该运行第二个查询以获取相关数据。


Mih*_*hin 3

http://docs.sqlalchemy.org/en/latest/orm/loading_relationships.html#sqlalchemy.orm.selectinload

它已添加到 sqlalchemy 中,因此现在您可以只使用selectinload策略。