来自 IntegrityError 的 Sqlalchemy session.rollback 导致 queuepool 用完处理程序?

dis*_*dng 3 python database sqlalchemy

我有下表:

class Feedback(Base):
  __tablename__ = 'feedbacks'
  __table_args__ = (UniqueConstraint('user_id', 'look_id'),)
  id = Column(Integer, primary_key=True)
  user_id = Column(Integer, ForeignKey('users.id'), nullable=False)
  look_id = Column(Integer, ForeignKey('looks.id'), nullable=False)
Run Code Online (Sandbox Code Playgroud)

我目前正在向该表中插入大量违反该 UniqueConstraint 的条目。

我正在使用以下代码:

  for comment in session.query(Comment).filter(Comment.type == Comment.TYPE_LOOK).yield_per(100):
    feedback = Feedback()
    feedback.user_id = User.get_or_create(comment.model_id).id
    feedback.look_id = comment.commentable_id
    session.add(feedback)
    try:        # Refer to T20
      session.flush()
    except IntegrityError,e:
      print "IntegrityError", e
      session.rollback()
  session.commit()
Run Code Online (Sandbox Code Playgroud)

我收到以下错误:

IntegrityError (IntegrityError) duplicate key value violates unique constraint "feedbacks_user_id_look_id_key"
DETAIL:  Key (user_id, look_id)=(140, 263008) already exists.
 'INSERT INTO feedbacks (user_id, look_id, score) VALUES (%(user_id)s, %(look_id)s, %(score)s) RETURNING feedbacks.id' {'user_id': 140, 'score': 1, 'look_id': 263008}
IntegrityError (IntegrityError) duplicate key value violates unique constraint "feedbacks_user_id_look_id_key"
...
(there's about 24 of these integrity errors here)
...
DETAIL:  Key (user_id, look_id)=(173, 263008) already exists.
 'INSERT INTO feedbacks (user_id, look_id, score) VALUES (%(user_id)s, %(look_id)s, %(score)s) RETURNING feedbacks.id' {'user_id': 173, 'score': 1, 'look_id': 263008}
No handlers could be found for logger "sqlalchemy.pool.QueuePool"
Traceback (most recent call last):
  File "load.py", line 40, in <module>
    load_crawl_data_into_feedback()
  File "load.py", line 21, in load_crawl_data_into_feedback
    for comment in session.query(Comment).filter(Comment.type == Comment.TYPE_LOOK).yield_per(100):
  File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2337, in instances
    fetch = cursor.fetchmany(self._yield_per)
  File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3230, in fetchmany
    self.cursor, self.context)
  File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3223, in fetchmany
    l = self.process_rows(self._fetchmany_impl(size))
  File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3343, in _fetchmany_impl
    row = self._fetchone_impl()
  File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3333, in _fetchone_impl
    self.__buffer_rows()
  File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3326, in __buffer_rows
    self.__rowbuffer = collections.deque(self.cursor.fetchmany(size))
sqlalchemy.exc.ProgrammingError: (ProgrammingError) named cursor isn't valid anymore None None
Run Code Online (Sandbox Code Playgroud)

在您对由 yield_per 引起的这个错误得出结论之前,我可以向您保证,yield_per 不是这里的罪魁祸首。

我尝试了没有唯一约束的相同代码,我根本没有遇到任何错误。

我相信完整性错误导致无法为记录器 "sqlalchemy.pool.QueuePool" 找到处理程序

我假设每个完整性错误都会杀死队列池中的每个“线程”。

有人可以启发我了解发生了什么吗?

如果此时我无法对数据做太多处理,您会建议我做什么?

Eev*_*vee 5

该错误仅来自 Pythonlogging模块;您的池类正在尝试记录一些调试消息,但您没有配置 SQLA 日志记录。 配置日志记录很容易,然后您就可以看到它实际上想表达什么。

不太确定这里发生了什么,但是数十次回滚顶级事务肯定无济于事。回滚结束事务并使每个活动行对象无效。那肯定不会与yield_per.

如果你的数据库支持保存点或嵌套交易(即,是的Postgres或Oracle ...或者也许最近的MySQL?),开始尝试每次尝试嵌套事务:

for comment in session.query(Comment).filter(Comment.type == Comment.TYPE_LOOK).yield_per(100):
    try:
        with session.begin_nested():
            feedback = Feedback()
            feedback.user_id = User.get_or_create(comment.model_id).id
            feedback.look_id = comment.commentable_id
            session.add(feedback)
            session.flush()
    except IntegrityError, e:
        print "IntegrityError", e

session.commit()
Run Code Online (Sandbox Code Playgroud)

with在错误时回滚并在成功时提交,因此失败flush不会对主事务的其余部分造成严重破坏。

如果您没有后端支持,那么想到的其他明智选择是:

  • 复杂化您的查询:对LEFT JOIN您的反馈表进行处理,以便您在应用程序内了解反馈行是否已存在。

  • 如果您愿意将(user_id, look_id)其设为主键,我认为您可以使用session.merge(feedback). 这就像基于主键的插入或更新:如果 SQLA 可以找到具有相同 pk 的现有行,它将更新该行,否则它将在数据库中创建一个新行。不过,可能会冒着SELECT为每一个新行触发一个额外的风险。