PostgreSQL 自动真空：“跳过冻结”页面导致大量膨胀

Question

PostgreSQL 自动真空：“跳过冻结”页面导致大量膨胀

我在论坛上使用的表格有问题，人们可以在那里添加他们的产品。

每次用户加载页面时，用户的“在线时间戳”都会更新（PHP 行阻止其每小时更新超过 1 次），这会触发其各自产品的“在线时间戳”也更新（扳机）。用户可以随时添加和编辑产品。这是表定义：

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  userid INT NOT NULL REFERENCES users(id),
  categoryid INT NOT NULL REFERENCES categories (id),
  regionid INT NOT NULL REFERENCES regions(id),
  title VARCHAR(255) NOT NULL,
  description TEXT NOT NULL,
  price DECIMAL(7,2) NOT NULL,
  create_timestamp BIGINT NOT NULL,
  modify_timestamp BIGINT NOT NULL,
  online_timestamp BIGINT NOT NULL
);

CREATE INDEX ON products (categoryid);
CREATE INDEX ON products (userid);
CREATE INDEX ON products (regionid);

Run Code Online (Sandbox Code Playgroud)

大约有 100,000 名用户，大约有 400,000 种产品。由于需要更新“在线时间戳”，该表经常更新（大约每秒 2 次）。

问题是：每周，数据库都变得如此臃肿，以至于我不得不中断网站，将数据库转储到 .sql 文件，删除数据库，然后重新导入转储。新数据库的大小约为 2 GB，膨胀的数据库在我必须转储 + 重新导入之前可以达到 80 - 100 GB。WAL 目录 (pg_xlog) 永远不会超过 1.1 GB，所以我认为 WAL 没有任何问题。

这是一个典型的查询：

> EXPLAIN ANALYZE SELECT COUNT(*) FROM products WHERE categoryid = 4;
                                                                    QUERY PLAN                                                                
    ------------------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on products  (cost=6405.66..228100.41 rows=63642 width=1193) (actual time=6.100..26.484 rows=8913 loops=1)
       Recheck Cond: (categoryid = 132)
       Heap Blocks: exact=6802
       ->  Bitmap Index Scan on products_categoryid_idx  (cost=0.00..6389.74 rows=63642 width=0) (actual time=3.742..3.742 rows=8917 loops=1)
             Index Cond: (categoryid = 132)
             Heap Fetches: 900
     Planning time: 2.832 ms
     Execution time: 27.267 ms
    (7 rows)

Run Code Online (Sandbox Code Playgroud)

在上面的例子中，“ heap fetches ”数量随着时间的推移而增加 - 在重新导入时，“heap fetches”大约是 250,000 - 300,000。不用说，查询越来越慢，CPU 使用率增加。

这是日志中发生真空的奇怪事情：

2017-02-26 17:54:15.781 UTC > LOG:  automatic vacuum of table "my_db.public.products": index scans: 1
    pages: 0 removed, 2419432 remain, 1 skipped due to pins, 2173455 skipped frozen
    tuples: 170090 removed, 2553063 remain, 334 are dead but not yet removable
    buffer usage: 2871815 hits, 842794 misses, 131788 dirtied
    avg read rate: 42.389 MB/s, avg write rate: 6.628 MB/s
    system usage: CPU 6.05s/9.79u sec elapsed 155.33 sec

Run Code Online (Sandbox Code Playgroud)

我们可以在上面的日志中看到“跳过冻结”，这表明它并没有真正被抽真空。尽管我进行了研究，但我找不到任何关于“冻结页面”的信息。

我无法为此提供解决方案。这是我的配置（显然，我使用的是最新版本）：

# -----------------------------
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
#   name = value
#
# (The "=" is optional.)  Whitespace may be used.  Comments are introduced with
# "#" anywhere on a line.  The complete list of parameter names and allowed
# values can be found in the PostgreSQL documentation.
#
# The commented-out settings shown in this file represent the default values.
# Re-commenting a setting is NOT sufficient to revert it to the default value;
# you need to reload the server.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal.  If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pg_ctl reload".  Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
#
# Any parameter can also be given as a command-line option to the server, e.g.,
# "postgres -c log_connections=on".  Some parameters can be changed at run time
# with the "SET" SQL command.
#
# Memory units:  kB = kilobytes        Time units:  ms  = milliseconds
#                MB = megabytes                     s   = seconds
#                GB = gigabytes                     min = minutes
#                TB = terabytes                     h   = hours
#                                                   d   = days


#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------

# The default values of these variables are driven from the -D command-line
# option or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'       # use data in another directory
                    # (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
                    # (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
                    # (change requires restart)

# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = ''         # write an extra PID file
                    # (change requires restart)


#------------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#------------------------------------------------------------------------------

# - Connection Settings -

#listen_addresses = 'localhost'     # what IP address(es) to listen on;
                    # comma-separated list of addresses;
                    # defaults to 'localhost'; use '*' for all
                    # (change requires restart)
#port = 5432                # (change requires restart)
max_connections = 500           # (change requires restart)
#superuser_reserved_connections = 3 # (change requires restart)
#unix_socket_directories = '/var/run/postgresql, /tmp'  # comma-separated list of directories
                    # (change requires restart)
#unix_socket_group = ''         # (change requires restart)
#unix_socket_permissions = 0777     # begin with 0 to use octal notation
                    # (change requires restart)
#bonjour = off              # advertise server via Bonjour
                    # (change requires restart)
#bonjour_name = ''          # defaults to the computer name
                    # (change requires restart)

# - Security and Authentication -

#authentication_timeout = 1min      # 1s-600s
#ssl = off              # (change requires restart)
#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL' # allowed SSL ciphers
                    # (change requires restart)
#ssl_prefer_server_ciphers = on     # (change requires restart)
#ssl_ecdh_curve = 'prime256v1'      # (change requires restart)
#ssl_cert_file = 'server.crt'       # (change requires restart)
#ssl_key_file = 'server.key'        # (change requires restart)
#ssl_ca_file = ''           # (change requires restart)
#ssl_crl_file = ''          # (change requires restart)
#password_encryption = on
#db_user_namespace = off
#row_security = on

# GSSAPI using Kerberos
#krb_server_keyfile = ''
#krb_caseins_users = off

# - TCP Keepalives -
# see "man 7 tcp" for details

#tcp_keepalives_idle = 0        # TCP_KEEPIDLE, in seconds;
                    # 0 selects the system default
#tcp_keepalives_interval = 0        # TCP_KEEPINTVL, in seconds;
                    # 0 selects the system default
#tcp_keepalives_count = 0       # TCP_KEEPCNT;
                    # 0 selects the system default


#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------

# - Memory -

shared_buffers = 64GB           # min 128kB
                    # (change requires restart)
huge_pages = try            # on, off, or try
                    # (change requires restart)
#temp_buffers = 8MB         # min 800kB
max_prepared_transactions = 0       # zero disables the feature
                    # (change requires restart)
# Caution: it is not advisable to set max_prepared_transactions nonzero unless
# you actively intend to use prepared transactions.
work_mem = 8MB              # min 64kB
maintenance_work_mem = 512MB        # min 1MB
#replacement_sort_tuples = 150000   # limits use of replacement selection sort
autovacuum_work_mem = -1        # min 1MB, or -1 to use maintenance_work_mem
max_stack_depth = 8MB           # min 100kB
dynamic_shared_memory_type = posix  # the default is the first option
                    # supported by the operating system:
                    #   posix
                    #   sysv
                    #   windows
                    #   mmap
                    # use none to disable dynamic shared memory

# - Disk -

temp_file_limit = -1            # limits per-process temp file space
                    # in kB, or -1 for no limit

# - Kernel Resource Usage -

#max_files_per_process = 1000       # min 25
                    # (change requires restart)
#shared_preload_libraries = ''      # (change requires restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0          # 0-100 milliseconds
#vacuum_cost_page_hit = 1       # 0-10000 credits
#vacuum_cost_page_miss = 10     # 0-10000 credits
#vacuum_cost_page_dirty = 20        # 0-10000 credits
#vacuum_cost_limit = 200        # 1-10000 credits

# - Background Writer -

#bgwriter_delay = 200ms         # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100        # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0      # 0-10.0 multiplier on buffers scanned/round
#bgwriter_flush_after = 512kB       # measured in pages, 0 disables

# - Asynchronous Behavior -

#effective_io_concurrency = 1       # 1-1000; 0 disables prefetching
#max_worker_processes = 8       # (change requires restart)
#max_parallel_workers_per_gather = 0    # taken from max_worker_processes
#old_snapshot_threshold = -1        # 1min-60d; -1 disables; 0 is immediate
                    # (change requires restart)
#backend_flush_after = 0        # measured in pages, 0 disables


#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -

#wal_level = minimal            # minimal, replica, or logical
                    # (change requires restart)
#fsync = on             # flush data to disk for crash safety
                        # (turning this off can cause
                        # unrecoverable data corruption)
#synchronous_commit = on        # synchronization level;
                    # off, local, remote_write, remote_apply, or on
#wal_sync_method = fsync        # the default is the first option
                    # supported by the operating system:
                    #   open_datasync
                    #   fdatasync (default on Linux)
                    #   fsync
                    #   fsync_writethrough
                    #   open_sync
#full_page_writes = on          # recover from partial page writes
#wal_compression = off          # enable compression of full-page writes
#wal_log_hints = off            # also do full page writes of non-critical updates
                    # (change requires restart)
#wal_buffers = -1           # min 32kB, -1 sets based on shared_buffers
                    # (change requires restart)
#wal_writer_delay = 200ms       # 1-10000 milliseconds
#wal_writer_flush_after = 1MB       # measured in pages, 0 disables

#commit_delay = 0           # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000

# - Checkpoints -

#checkpoint_timeout = 5min      # range 30s-1d
max_wal_size = 1GB
min_wal_size = 80MB
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
#checkpoint_flush_after = 256kB     # measured in pages, 0 disables
#checkpoint_warning = 30s       # 0 disables

# - Archiving -

#archive_mode = off     # enables archiving; off, on, or always
                # (change requires restart)
#archive_command = ''       # command to use to archive a logfile segment
                # placeholders: %p = path of file to archive
                #               %f = file name only
                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0        # force a logfile segment switch after this
                # number of seconds; 0 disables


#------------------------------------------------------------------------------
# REPLICATION
#------------------------------------------------------------------------------

# - Sending Server(s) -

# Set these on the master and on any standby that will send replication data.

#max_wal_senders = 0        # max number of walsender processes
                # (change requires restart)
#wal_keep_segments = 0      # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s   # in milliseconds; 0 disables

#max_replication_slots = 0  # max number of replication slots
                # (change requires restart)
#track_commit_timestamp = off   # collect timestamp of transaction commit
                # (change requires restart)

# - Master Server -

# These settings are ignored on a standby server.

#synchronous_standby_names = '' # standby servers that provide sync rep
                # number of sync standbys and comma-separated list of application_name
                # from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is delayed

# - Standby Servers -

# These settings are ignored on a master server.

#hot_standby = off          # "on" allows queries during recovery
                    # (change requires restart)
#max_standby_archive_delay = 30s    # max delay before canceling queries
                    # when reading WAL from archive;
                    # -1 allows indefinite delay
#max_standby_streaming_delay = 30s  # max delay before canceling queries
                    # when reading streaming WAL;
                    # -1 allows indefinite delay
#wal_receiver_status_interval = 10s # send replies at least this often
                    # 0 disables
#hot_standby_feedback = off     # send info from standby to prevent
                    # query conflicts
#wal_receiver_timeout = 60s     # time that receiver waits for
                    # communication from master
                    # in milliseconds; 0 disables
#wal_retrieve_retry_interval = 5s   # time to wait before retrying to
                    # retrieve WAL after a failed attempt


#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#seq_page_cost = 1.0            # measured on an arbitrary scale
#random_page_cost = 4.0         # same scale as above
#cpu_tuple_cost = 0.01          # same scale as above
#cpu_index_tuple_cost = 0.005       # same scale as above
#cpu_operator_cost = 0.0025     # same scale as above
#parallel_tuple_cost = 0.1      # same scale as above
#parallel_setup_cost = 1000.0   # same scale as above
#min_parallel_relation_size = 8MB
effective_cache_size = 32GB

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5            # range 1-10
#geqo_pool_size = 0         # selects default based on effort
#geqo_generations = 0           # selects default based on effort
#geqo_selection_bias = 2.0      # range 1.5-2.0
#geqo_seed = 0.0            # range 0.0-1.0

# - Other Planner Options -

#default_statistics_target = 100    # range 1-10000
#constraint_exclusion = partition   # on, off, or partition
#cursor_tuple_fraction = 0.1        # range 0.0-1.0
#from_collapse_limit = 8
#join_collapse_limit = 8        # 1 disables collapsing of explicit
                    # JOIN clauses
#force_parallel_mode = off


#------------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#------------------------------------------------------------------------------

# - Where to Log -

log_destination = 'stderr'      # Valid values are combinations of
                    # stderr, csvlog, syslog, and eventlog,
                    # depending on platform.  csvlog
                    # requires logging_collector to be on.

# This is used when logging to stderr:
logging_collector = on          # Enable capturing of stderr and csvlog
                    # into log files. Required to be on for
                    # csvlogs.
                    # (change requires restart)

# These are only used if logging_collector is on:
log_directory = 'pg_log'        # directory where log files are written,
                    # can be absolute or relative to PGDATA
log_filename = 'postgresql-%a.log'  # log file name pattern,
                    # can include strftime() escapes
#log_file_mode = 0600           # creation mode for log files,
                    # begin with 0 to use octal notation
log_truncate_on_rotation = on       # If on, an existing log file with the
                    # same name as the new log file will be
                    # truncated rather than appended to.
                    # But such truncation only occurs on
                    # time-driven rotation, not on restarts
                    # or size-driven rotation.  Default is
                    # off, meaning append to existing files
                    # in all cases.
log_rotation_age = 1d           # Automatic rotation of logfiles will
                    # happen after that time.  0 disables.
log_rotation_size = 0           # Automatic rotation of logfiles will
                    # happen after that much log output.
                    # 0 disables.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
#syslog_sequence_numbers = on
#syslog_split_messages = on

# This is only relevant when logging to eventlog (win32):
#event_source = 'PostgreSQL'

# - When to Log -

#client_min_messages = notice       # values in order of decreasing detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   log
                    #   notice
                    #   warning
                    #   error

#log_min_messages = warning     # values in order of decreasing detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   info
                    #   notice
                    #   warning
                    #   error
                    #   log
                    #   fatal
                    #   panic

#log_min_error_statement = error    # values in order of decreasing detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   info
                    #   notice
                    #   warning
                    #   error
                    #   log
                    #   fatal
                    #   panic (effectively off)

#log_min_duration_statement = -1    # -1 is disabled, 0 logs all statements
                    # and their durations, > 0 logs only
                    # statements running at least this number
                    # of milliseconds


# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = on
#log_checkpoints = off
#log_connections = off
#log_disconnections = off
#log_duration = off
#log_error_verbosity = default      # terse, default, or verbose messages
#log_hostname = off
log_line_prefix = '< %m > '         # special values:
                    #   %a = application name
                    #   %u = user name

Answer 1

Cra*_*ger 5

这些“冻结”消息指的是从可见性映射的冻结位中已知仅包含冻结元组的页面。冻结元组是所有当前和未来事务都可以“看到”的元组，即它在任何仍在运行的 xact 开始之前提交。这些不应该被删除，这不是问题；它们是不会改变的数据。

真空正在做有用的工作，请参阅：

tuples: 170090 removed ...

Run Code Online (Sandbox Code Playgroud)

它不会截断表格（删除页面），但它不需要......也不应该。因为您的负载模式意味着您总是需要至少两倍的实际数据空间来跟踪所有UPDATE流失的死行。

但是，您遇到的膨胀程度令人惊讶。很明显，有些事情不正常。也许您在新的冻结地图代码中发现了一个问题，但我会先寻找其他解释。

您是否有大量长期运行的事务？如果您的应用无法关闭交易，您将看到大量关于“不可移动”行的消息。

EXPLAIN (BUFFERS, ANALYZE, VERBOSE)当系统变得臃肿时，你能从问题查询中提供一个吗？以及VACUUM (FULL, VERBOSE)一个或多个问题表？（这将在重写表时锁定表）。

如果您启用 autovacuum 日志记录，是否会显示任何有趣的内容？

是表或索引变得臃肿吗？只是几张桌子还是所有桌子都一样？只是忙碌的人吗？有些指标比其他指标差？等等。看看这个。

另外，您的设计非常糟糕TBH。您真的应该将该活动数据移动到一对分别连接用户表和产品表的边表中。更新那里的活动行，这样您就不会经常重写充满变化不大的数据的主表。它将有助于磁盘 I/O、缓存命中率等。

只有产品表变得臃肿 - 我已经调整了一些自动真空参数，使其更加积极，并将很快更新。 (2认同)

归档时间：	8 年，8 月前
查看次数：	1466 次
最近记录：	8 年，8 月前