将关系的主要连接限制为一个

Jac*_*son 2 python sqlalchemy

我有一个代表交付的 SQLAlchemy 模型;交货有目的地、包裹 ID 和日期:

class Delivery(Base):
    delivery_id = Column(Integer, primary_key=True, autoincrement=True)
    parcel_id = Column(ForeignKey('parcels.parcel_id'))
    scheduled_date = Column(DateTime)
    destination_id = Column(ForeignKey('location.location_id'))
Run Code Online (Sandbox Code Playgroud)

现在,同一包裹的投递始发地与先前投递的目的地相同。我没有通过维护基于指针的链接列表来非规范化该信息,而是使用计划日期来订购交货,目前如下所示:

def origin(delivery):
    prior = session.query(Delivery)
           .filter(
                Delivery.parcel_id == delivery.parcel_id,
                Delivery.scheduled_date < delivery.scheduled_date,
           )
           .order_by(Delivery.scheduled_date.desc())
           .first()
    return prior.location_id if prior else None
Run Code Online (Sandbox Code Playgroud)

在纯 SQL 中,我可以将这个单独的查询转换为一个简单的子查询 + 连接,我在加载交付时将其包含在内。我已经足够了,我可以加载当前交付之前发生的所有相关交付:

_prior_delivery = \
    select([Delivery.parcel_id, Delivery.scheduled_date, Location]) \
        .where(and_(Location.location_id == remote(Delivery.location_id)) \
        .order_by(Delivery.scheduled_date.desc()) \
        .alias("prior_delivery")

Delivery.origin = relationship(
    Location,
    primaryjoin=and_(_prior_delivery.c.parcel_id == foreign(Delivery.parcel_id),
                     _prior_delivery.c.scheduled_date < foreign(Delivery.scheduled_date)),
    secondary=_prior_delivery,
    secondaryjoin=_prior_delivery.c.location_id == foreign(Location.location_id),
    uselist=False,
    viewonly=True)
Run Code Online (Sandbox Code Playgroud)

因为uselist=False,这实际上是有效的;但在幕后,它会返回当前交付之前发生的每一次交付;SQLAlchemy 打印一条警告,并且结果集比需要的大得多。

我的问题:如何将 a 应用于limit(1)此只读关系?

uni*_*rio 5

第一次尝试

这很困难的原因是关系需要能够连接到主查询中。SQLAlchemy 需要能够在同一个查询中加载关系才能实现预加载。问题是,如何编写一个加载s列表Delivery及其每个origins 的查询?

SELECT delivery.*, location.* FROM delivery
LEFT JOIN location ON location.location_id = (
  SELECT destination_id FROM delivery prior
  WHERE delivery.parcel_id = prior.parcel_id
  ORDER BY prior.scheduled_date DESC
  LIMIT 1
);
Run Code Online (Sandbox Code Playgroud)

实际上,相关子查询

SELECT destination_id FROM delivery prior
WHERE delivery.parcel_id = prior.parcel_id
ORDER BY prior.scheduled_date DESC
LIMIT 1
Run Code Online (Sandbox Code Playgroud)

成为一个计算外键origin_id,您可以通过它连接到location表。将其转换为 SQLAlchemy,它会是这样的:

delivery = Delivery.__table__
location = Location.__table__
prior = alias(delivery, "prior")
_origin_id = select([prior.c.destination_id])\
    .where(delivery.c.parcel_id == prior.c.parcel_id)\
    .order_by(prior.c.scheduled_date.desc())\
    .limit(1)
Delivery.origin = relationship(
    Location,
    primaryjoin=_origin_id == location.c.location_id,
    viewonly=True)
Run Code Online (Sandbox Code Playgroud)

不幸的是,对于我尝试过的所有和注释的组合,这似乎不起作用。remoteforeign

使用SELECT带有相关子查询的 a 作为secondary

下一个最佳解决方案是使用假辅助表:

SELECT delivery.*, location.* FROM delivery
LEFT JOIN (
  SELECT delivery.delivery_id, (
    SELECT destination_id FROM delivery prior
    WHERE delivery.parcel_id = prior.parcel_id
    ORDER BY prior.scheduled_date DESC
    LIMIT 1
  ) AS origin_id FROM delivery
) delivery_origin ON delivery.delivery_id = delivery_origin.delivery_id
LEFT JOIN location ON delivery_origin.origin_id = location.location_id;
Run Code Online (Sandbox Code Playgroud)

在 SQLAlchemy 中,这是:

delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")
_origin_id = select([prior.c.destination_id])\
    .where(current.c.parcel_id == prior.c.parcel_id)\
    .order_by(prior.c.scheduled_date.desc())\
    .limit(1)\
    .label("origin_id")
delivery_origin = select([
    UnaryExpression(current.c.delivery_id, operator=custom_op("")).label("delivery_id"),
    _origin_id,
]).select_from(current)
Delivery.origin = relationship(
    Location,
    primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
    secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
    secondary=delivery_origin,
    viewonly=True,
    uselist=False)
Run Code Online (Sandbox Code Playgroud)

不幸的是,似乎有一个错误(可能与此问题相关)导致 SQLAlchemy 发出错误的连接,因此我们需要应用一个小技巧:

delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")

# HACK: wrap delivery_id in an empty unary operator
_delivery_id = UnaryExpression(current.c.delivery_id, operator=custom_op(""))\
    .label("delivery_id")
# /HACK

_origin_id = select([prior.c.destination_id])\
    .where(current.c.parcel_id == prior.c.parcel_id)\
    .order_by(prior.c.scheduled_date.desc())\
    .limit(1)\
    .label("origin_id")
delivery_origin = select([
    _delivery_id,
    _origin_id,
]).select_from(current)
Delivery.origin = relationship(
    Location,
    primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
    secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
    secondary=delivery_origin,
    viewonly=True,
    uselist=False)
Run Code Online (Sandbox Code Playgroud)

使用SELECT带有窗口函数的secondary

可能具有更好性能特征的另一种实现是使用窗口函数:

SELECT delivery.*, location.* FROM delivery
LEFT JOIN (
  SELECT
    delivery.delivery_id,
    lag(delivery.delivery_id) OVER (PARTITION BY delivery.parcel_id ORDER BY delivery.scheduled_date) AS origin_id
  FROM delivery
) delivery_origin ON delivery.delivery_id = delivery_origin.delivery_id
LEFT JOIN location ON delivery_origin.origin_id = location.location_id;
Run Code Online (Sandbox Code Playgroud)

和以前一样,我们需要应用类似的 hack 来让 SQLAlchemy 生成正确的 SQL:

delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")

# HACK: wrap delivery_id in an empty unary operator
_delivery_id = UnaryExpression(current.c.delivery_id, operator=custom_op(""))\
    .label("delivery_id")
# /HACK

_origin_id = func.lag(current.c.delivery_id)\
    .over(partition_by=current.c.parcel_id,
          order_by=current.c.scheduled_date)\
    .label("origin_id")
delivery_origin = select([
    _delivery_id,
    _origin_id,
]).select_from(current)
Delivery.origin = relationship(
    Location,
    primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
    secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
    secondary=delivery_origin,
    viewonly=True,
    uselist=False)
Run Code Online (Sandbox Code Playgroud)