使用 HAVING 选择 SQLAlchemy 列的计数和值

boo*_*dev 3 python postgresql sqlalchemy

我想选择具有相同电子邮件地址且重复次数超过一个的所有联系人的计数。我无法使用 PostgreSQL 在 SQLAlchey 中使用此查询。

SELECT count(*), email FROM contact group by email having count(*) > 1
Run Code Online (Sandbox Code Playgroud)

我试过这个:

all_records = db.session.query(Contact).options(
    load_only('email')).group_by(Contact.email).having(
    func.count('*') > 1).all()
Run Code Online (Sandbox Code Playgroud)
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "contact.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT contact.id AS contact_id, contact.email AS contact_em...
           ^
[SQL: 'SELECT contact.id AS contact_id, contact.email AS contact_email \nFROM contact GROUP BY contact.email \nHAVING count(%(count_1)s) > %(count_2)s'] [parameters: {'count_1': '*', 'count_2': 1}]
Run Code Online (Sandbox Code Playgroud)

我试过这个:

all_records = db.session.query(func.count(Contact.id)).options(
    load_only('email')).group_by(Contact.email).having(
    func.count('*') > 1).all()
Run Code Online (Sandbox Code Playgroud)
sqlalchemy.exc.ArgumentError
sqlalchemy.exc.ArgumentError: Wildcard loader can only be used with exactly one entity.  Use Load(ent) to specify specific entities.
Run Code Online (Sandbox Code Playgroud)

如果我执行原始 SQL,它可以正常工作:

all_records = db.session.execute(
    "SELECT count(*), email FROM contact group by email"
    " having count(*) > 1").fetchall()
Run Code Online (Sandbox Code Playgroud)

我正在使用 Flask-SQLAlchemy,但这里有一个最小的 SQLAlchemy 设置来演示这个问题:

import sqlalchemy as sa
from sqlalchemy import orm
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Contact(Base):
    __tablename__ = 'contact'
    id = sa.Column(sa.Integer, primary_key=True)
    email = sa.Column(sa.String)

engine = sa.create_engine('postgresql:///example', echo=True)
Base.metadata.create_all(engine)
session = orm.Session(engine)
session.add_all((
    Contact(email='a@example.com'),
    Contact(email='b@example.com'),
    Contact(email='a@example.com'),
    Contact(email='c@example.com'),
    Contact(email='a@example.com'),
))
session.commit()

# first failed query
all_records = session.query(Contact).options(
    orm.load_only('email')).group_by(Contact.email).having(
    sa.func.count('*') > 1).all()

# second failed query
all_records = db.session.query(sa.func.count(Contact.id)).options(
    orm.load_only('email')).group_by(Contact.email).having(
    sa.func.count('*') > 1).all()
Run Code Online (Sandbox Code Playgroud)

使用示例数据,我希望得到一个结果行3, a@example.com.

dav*_*ism 5

您不会在 SQLAlchemy 中构建与手动编写相同的查询。

您想要选择出现多次的每封电子邮件的计数。

q = session.query(
    db.func.count(Contact.email),
    Contact.email
).group_by(
    Contact.email
).having(
    db.func.count(Contact.email) > 1
)
print(q)
Run Code Online (Sandbox Code Playgroud)
q = session.query(
    db.func.count(Contact.email),
    Contact.email
).group_by(
    Contact.email
).having(
    db.func.count(Contact.email) > 1
)
print(q)
Run Code Online (Sandbox Code Playgroud)

第一个查询失败,因为您查询了整个模型,因此 SQLAlchemy 选择所有列。使用时只能选择分组列group_by。SQLAlchemy 在查询整个模型时必须始终选择主键,load_only不会影响到这一点。

第二个查询失败,因为load_only仅在选择整个模型时才有效,但您选择的是聚合和列。