SQLAlchemy 最新值的高效子查询

Question

SQLAlchemy 最新值的高效子查询

Pet*_*ron 1 python sqlalchemy subquery select-n-plus-1

实体status属性的当前值可以作为该实体的EntityHistory表中的最新条目进行查询，即

Entities (id) <- EntityHistory (timestamp, entity_id, value)

Run Code Online (Sandbox Code Playgroud)

如何编写高效的 SQLALchemy 表达式，从历史记录表中为所有实体急切加载当前值，而不会导致 N+1 查询？

我尝试为我的模型编写一个属性，但是当我遍历它时，这会为每个 (N+1) 生成一个查询。据我所知，没有子查询就无法解决这个问题，这对我来说在数据库上仍然效率低下。

示例`EntityHistory`数据：

timestamp |entity_id| value
==========|=========|======
     15:00|        1|     x
     15:01|        1|     y
     15:02|        2|     x
     15:03|        2|     y
     15:04|        1|     z

Run Code Online (Sandbox Code Playgroud)

因此，对于实体1的电流值将是z和实体2这将是y。后备数据库是 Postgres。

Answer 1

Ilj*_*ilä 5

我认为您可以使用 acolumn_property将最新值作为Entities实例的属性以及其他列映射属性加载：

from sqlalchemy import select
from sqlalchemy.orm import column_property

class Entities(Base):

    ...

    value = column_property(
        select([EntityHistory.value]).
        where(EntityHistory.entity_id == id).  # the id column from before
        order_by(EntityHistory.timestamp.desc()).
        limit(1).
        correlate_except(EntityHistory)
    )

Run Code Online (Sandbox Code Playgroud)

子查询当然也可以用于查询而不是column_property.

query = session.query(
    Entities,
    session.query(EntityHistory.value).
        filter(EntityHistory.entity_id == Entities.id).
        order_by(EntityHistory.timestamp.desc()).
        limit(1).
        label('value')
)

Run Code Online (Sandbox Code Playgroud)

性能自然取决于适当的索引：

Index('entityhistory_entity_id_timestamp_idx',
      EntityHistory.entity_id,
      EntityHistory.timestamp.desc())

Run Code Online (Sandbox Code Playgroud)

在某种程度上，这仍然是您可怕的 N+1，因为查询每行使用一个子查询，但它隐藏在到数据库的单次往返中。

另一方面，如果不需要将value作为属性Entities，在 Postgresql 中，您可以加入DISTINCT ON ... ORDER BY查询以获取最新值：

values = session.query(EntityHistory.entity_id,
                       EntityHistory.value).\
    distinct(EntityHistory.entity_id).\
    # The same index from before speeds this up.
    # Remember nullslast(), if timestamp can be NULL.
    order_by(EntityHistory.entity_id, EntityHistory.timestamp.desc()).\
    subquery()

query = session.query(Entities, values.c.value).\
    join(values, values.c.entity_id == Entities.id)

Run Code Online (Sandbox Code Playgroud)

尽管在使用虚拟数据的有限测试中，如果每个实体都有值，子查询作为输出列总是以显着的优势击败连接。另一方面，如果有数百万个实体和大量缺失的历史值，那么 LEFT JOIN 会更快。我建议对您自己的数据进行测试，哪个查询更适合您的数据。对于单个实体的随机访问，如果索引就位，相关子查询会更快。对于批量获取：测试。

归档时间：	9 年，6 月前
查看次数：	1139 次
最近记录：	9 年，6 月前

SQLAlchemy 最新值的高效子查询

示例EntityHistory数据：

示例`EntityHistory`数据：