为什么 Django 会话表在 PostgreSQL 上增长

raa*_*cer 7 postgresql size

我在我的 Django 1.4.8 项目中通过 PgBouncer 1.4.2 使用 PostgreSQL 9.1.6。django_session表一直在增长。VACUUM 没有帮助。VACUUM FULL 运行数小时。TRUNCATE通常只帮助。

该表包含大约 10KB 的行。每秒更新 40 次。

# SELECT table_name, pg_size_pretty(table_size) AS table_size, pg_size_pretty(indexes_size) AS indexes_size, pg_size_pretty(total_size) AS total_size
FROM (
    SELECT table_name, pg_table_size(table_name) AS table_size, pg_indexes_size(table_name) AS indexes_size, pg_total_relation_size(table_name) AS total_size
    FROM (
        SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name
        FROM information_schema.tables
    ) AS all_tables ORDER BY total_size DESC LIMIT 1
) AS pretty_sizes;
        table_name         | table_size | indexes_size | total_size 
---------------------------+------------+--------------+------------
 "public"."django_session" | 35 GB      | 209 MB       | 35 GB
(1 row)

# SELECT COUNT(*) FROM django_session;
 count 
-------
 40196
Run Code Online (Sandbox Code Playgroud)

cron 每晚都会执行一次清理命令。

./manage.py cleanup && echo "vacuum (analyze, verbose);" | ./manage.py dbshell
Run Code Online (Sandbox Code Playgroud)

输出中的 django_session 有一些有趣的东西。

INFO:  vacuuming "public.django_session"
INFO:  scanned index "django_session_pkey" to remove 5338 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  scanned index "django_session_expire_date" to remove 5338 row versions
DETAIL:  CPU 0.10s/0.06u sec elapsed 3.47 sec.
INFO:  "django_session": removed 5338 row versions in 187 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "django_session_pkey" now contains 71568 row versions in 1647 pages
DETAIL:  0 index row versions were removed.
4 index pages have been deleted, 4 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "django_session_expire_date" now contains 71568 row versions in 25049 pages
DETAIL:  2785 index row versions were removed.
1804 index pages have been deleted, 1798 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "django_session": found 0 removable, 71568 nonremovable row versions in 3386 out of 3386 pages
DETAIL:  46699 dead row versions cannot be removed yet.
There were 811150 unused item pointers.
0 pages are entirely empty.
CPU 0.11s/0.10u sec elapsed 16.00 sec.
...
INFO:  analyzing "public.django_session"
INFO:  "django_session": scanned 3460 of 3460 pages, containing 25514 live rows and 288939 dead rows; 25514 rows in sample, 25514 estimated total rows
Run Code Online (Sandbox Code Playgroud)

我想知道46699 个死行版本811150 个未使用的项目指针288939 个死行。许多消息来源称问题在于“交易中的空闲”流程。我认为这是我的情况,因为有很多这样的过程:

# SELECT COUNT(*) FROM pg_stat_activity WHERE current_query = '<IDLE> in transaction';
 count 
-------
    30
Run Code Online (Sandbox Code Playgroud)

但是没有真正旧的查询。

# SELECT age(now(), query_start) AS "age" FROM pg_stat_activity ORDER BY query_start LIMIT 1;
       age       
-----------------
 00:00:00.241521
Run Code Online (Sandbox Code Playgroud)

大量空闲进程可能是由 TransactionMiddleware 引起的。

所以现在我没有任何想法。我手动执行了清理命令并且它运行良好,所以我决定问题以某种方式消失并截断了表格。但它又长大了。

表增长图

# select * from pgstattuple('django_session');
-[ RECORD 1 ]------+---------
table_len          | 23003136
tuple_count        | 32139
tuple_len          | 11201544
tuple_percent      | 48.7
dead_tuple_count   | 1729
dead_tuple_len     | 171632
dead_tuple_percent | 0.75
free_space         | 8930044
free_percent       | 38.82
Run Code Online (Sandbox Code Playgroud)

什么可能导致这种奇怪的问题?

更新:

我已经删除了 TransactionMiddleware 并在 PgBouncer 中切换到事务模式。现在几乎没有“交易空闲”流程。但这无济于事。

$ echo $(for i in `seq 100`; do echo "SELECT COUNT(*) FROM pg_stat_activity WHERE current_query = '<IDLE> in transaction';" | sudo -u postgres psql mydb | grep  ' 0'; sleep 0.1; done | wc -l)"% of time there is NO idle in transaction processes"

75% of time there is NO idle in transaction processes
Run Code Online (Sandbox Code Playgroud)

raa*_*cer 0

好的,这是我经过大量研究后的答案。

不要在 KVM 等虚拟机上运行高负载的 PostgreSQL 服务器。即使使用适当的驱动程序直接访问硬盘,它也缺乏 I/O 速度。