`ON CONFLICT DO UPDATE` 导致死锁?

Fak*_*ame 1 deadlock locking postgresql-9.5

我有一个项目,我试图在其中使用 PostgreSQLON CONFLICT DO UPDATE子句,但我遇到了大量死锁问题。

我的架构如下:

webarchive=# \d web_pages
                                               Table "public.web_pages"
      Column       |            Type             |                              Modifiers
-------------------+-----------------------------+---------------------------------------------------------------------
 id                | integer                     | not null default nextval('web_pages_id_seq'::regclass)
 state             | dlstate_enum                | not null
 errno             | integer                     |
 url               | text                        | not null
 starturl          | text                        | not null
 netloc            | text                        | not null
 file              | integer                     |
 priority          | integer                     | not null
 distance          | integer                     | not null
 is_text           | boolean                     |
 limit_netloc      | boolean                     |
 title             | citext                      |
 mimetype          | text                        |
 type              | itemtype_enum               |
 content           | text                        |
 fetchtime         | timestamp without time zone |
 addtime           | timestamp without time zone |
 tsv_content       | tsvector                    |
 normal_fetch_mode | boolean                     | default true
 ignoreuntiltime   | timestamp without time zone | not null default '1970-01-01 00:00:00'::timestamp without time zone
Indexes:
    "web_pages_pkey" PRIMARY KEY, btree (id)
    "ix_web_pages_url" UNIQUE, btree (url)
    "idx_web_pages_title" gin (to_tsvector('english'::regconfig, title::text))
    "ix_web_pages_distance" btree (distance)
    "ix_web_pages_distance_filtered" btree (priority) WHERE state = 'new'::dlstate_enum AND distance < 1000000 AND normal_fetch_mode = true
    "ix_web_pages_id" btree (id)
    "ix_web_pages_netloc" btree (netloc)
    "ix_web_pages_priority" btree (priority)
    "ix_web_pages_state" btree (state)
    "ix_web_pages_url_ops" btree (url text_pattern_ops)
    "web_pages_state_netloc_idx" btree (state, netloc)
Foreign-key constraints:
    "web_pages_file_fkey" FOREIGN KEY (file) REFERENCES web_files(id)
Triggers:
    update_row_count_trigger BEFORE INSERT OR UPDATE ON web_pages FOR EACH ROW EXECUTE PROCEDURE web_pages_content_update_func()
Run Code Online (Sandbox Code Playgroud)

我的更新命令如下:

INSERT INTO
    web_pages
    (url, starturl, netloc, distance, is_text, priority, type, fetchtime, state)
VALUES
    (:url, :starturl, :netloc, :distance, :is_text, :priority, :type, :fetchtime, :state)
ON CONFLICT (url) DO
    UPDATE
        SET
            state     = EXCLUDED.state,
            starturl  = EXCLUDED.starturl,
            netloc    = EXCLUDED.netloc,
            is_text   = EXCLUDED.is_text,
            distance  = EXCLUDED.distance,
            priority  = EXCLUDED.priority,
            fetchtime = EXCLUDED.fetchtime
        WHERE
            web_pages.fetchtime < :threshtime
        AND
            web_pages.url = EXCLUDED.url
    ;
Run Code Online (Sandbox Code Playgroud)

(注意:参数通过SQLAlchemy参数化查询样式进行转义)

我看到了几十个死锁错误,即使在相对较轻的并发(6 个工人)下:

Main.SiteArchiver.Process-5.MainThread - WARNING - SQLAlchemy OperationalError - Retrying.
Traceback (most recent call last):
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL:  Process 11391 waits for ShareLock on transaction 40632808; blocked by process 11389.
Process 11389 waits for ShareLock on transaction 40632662; blocked by process 11391.
HINT:  See server log for query details.
CONTEXT:  while inserting index tuple (743427,2) in relation "web_pages"


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Engine.py", line 558, in upsertResponseLinks
    self.db_sess.execute(cmd, params=new)
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/orm/session.py", line 1034, in execute
    bind, close_with_result=True).execute(clause, params or {})
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 183, in reraise
    raise value.with_traceback(tb)
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
DETAIL:  Process 11391 waits for ShareLock on transaction 40632808; blocked by process 11389.
Process 11389 waits for ShareLock on transaction 40632662; blocked by process 11391.
HINT:  See server log for query details.
CONTEXT:  while inserting index tuple (743427,2) in relation "web_pages"
 [SQL: '         INSERT INTO          web_pages          (url, starturl, netloc, distance, is_text, priority, type, fetchtime, state)         VALUES          (%(url)s, %(starturl)s, %(netloc)s, %(distance)s, %(is_text)s, %(priority)s, %(type)s, %(fetchtime)s, %(state)s)         ON CONFLICT (url) DO          UPDATE           SET            state     = EXCLUDED.state,            starturl  = EXCLUDED.starturl,            netloc    = EXCLUDED.netloc,            is_text   = EXCLUDED.is_text,            distance  = EXCLUDED.distance,            priority  = EXCLUDED.priority,            fetchtime = EXCLUDED.fetchtime           WHERE            web_pages.fetchtime < %(threshtime)s          ;         '] [parameters: {'url': 'xxxxxx', 'is_text': True, 'netloc': 'xxxxxx', 'distance': 1000000, 'priority': 10000, 'threshtime': datetime.datetime(2016, 4, 24, 0, 38, 10, 778866), 'state': 'new', 'starturl': 'xxxxxxx', 'type': 'unknown', 'fetchtime': datetime.datetime(2016, 4, 24, 0, 38, 10, 778934)}]
Run Code Online (Sandbox Code Playgroud)

我的事务隔离级别是REPEATABLE READ,所以我对数据库应该如何工作的理解是,我会看到很多序列化错误,但不应该发生死锁,因为如果两个事务更改同一行,那么后面的事务应该会失败。

我的猜测是 UPDATE 以某种方式锁定了 INSERT 查询(或类似的东西),我需要在某处放置一个同步点(?),但我不太了解各种查询组件的范围做任何故障排除,然后只是随机更改内容并查看有什么影响。我已经阅读了一些资料,但是 PostgreSQL 文档非常抽象,而且ON CONFLICT xxx术语似乎还没有被广泛使用,因此用于实际故障排除的资源并不多,特别是对于非 SQL 专家。

我该如何尝试解决这个问题?我还尝试了其他隔离级别 ( READ COMMITTED, SERIALIZABLE) ,但无济于事。

小智 5

死锁不是由特定语句引起的。它是由并发问题引起的。所以基本上,您应该开始观察应用程序的一个会话如何处理同时工作的其他会话。

以下是避免死锁的一般准则:

  1. 始终维护表上的主键。这个主键应该是识别表中特定记录的方法。这将避免太多行进入锁定范围。

  2. 维护所有交易的顺序。例如,您的应用程序逻辑之一在表 A 和表 B 中插入/更新数据。不应该有另一个逻辑在表 B 和表 A 中插入/更新数据。

  3. 监视并抓住罪魁祸首。PostgreSQL 提供 pg_stat_activty 和 pg_stat_statements 之类的视图来监控会话和查询。以下是一些示例查询,您可以使用它们来监控阻塞/死锁。https://wiki.postgresql.org/wiki/Lock_Monitoring 您可能需要调整 log_lock_waits 和 deadlock_timeout 参数。

  4. 在事务中首先获得最严格的锁。所以较小的不会一路走来。

  5. 最后但最不重要的是,减少交易的规模。更频繁地提交。长时间运行的事务更有可能陷入死锁。此外,由于 MVCC 在 Postgres 中的实现方式,postgres 中的长事务拥有更多数量的活动元组。