我最近在使用 Azure Database for PostgreSQL 时注意到一个问题,我的内存使用量不断增长,达到 100% 时,服务器将停止响应。
这个数据库服务器专门用于开发,所以它有很多短期连接,经常被强行关闭(因为人们重新启动他们的应用程序来修复这里或那里的错误。)
在日志中,在发生这种情况之前,我可以看到两种模式,即自动清理错误:
2018-05-22 11:16:13 UTC-5ae5085b.20-LOG: CreateProcess call failed: No error (error code 1455)
2018-05-22 11:16:13 UTC-5ae5085b.20-LOG: could not fork autovacuum worker process: No error
2018-05-22 11:16:14 UTC-5ae5085b.20-LOG: CreateProcess call failed: A blocking operation was interrupted by a call to WSACancelBlockingCall.
(error code 1455)
Run Code Online (Sandbox Code Playgroud)
似乎是内存使用转储,然后是请求失败的警告:
TopMemoryContext: 143584 total in 6 blocks; 68072 free (43 chunks); 75512 used
TopTransactionContext: 8192 total in 1 blocks; 7960 free (0 chunks); 232 used
CFuncHash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
Type information cache: 24472 total in 2 blocks; 2840 free (0 chunks); 21632 used
Record information cache: 24576 total in 2 blocks; 15072 free (5 chunks); 9504 used
Operator lookup cache: 24576 total in 2 blocks; 10976 free (5 chunks); 13600 used
TableSpace cache: 8192 total in 1 blocks; 2312 free (0 chunks); 5880 used
MessageContext: 8192 total in 1 blocks; 6968 free (0 chunks); 1224 used
Operator class cache: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
smgr relation table: 24576 total in 2 blocks; 8896 free (4 chunks); 15680 used
TransactionAbortContext: 32768 total in 1 blocks; 32728 free (0 chunks); 40 used
Portal hash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
PortalMemory: 8192 total in 1 blocks; 7880 free (0 chunks); 312 used
PortalHeapMemory: 1024 total in 1 blocks; 672 free (0 chunks); 352 used
ExecutorState: 4218936 total in 10 blocks; 1970640 free (11 chunks); 2248296 used
printtup: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Table function arguments: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 1040384 total in 7 blocks; 168 free (3 chunks); 1040216 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 12960 free (4 chunks); 11616 used
CacheMemoryContext: 1040384 total in 7 blocks; 422592 free (14 chunks); 617792 used
CachedPlan: 31744 total in 5 blocks; 13368 free (0 chunks); 18376 used
CachedPlanSource: 3072 total in 2 blocks; 496 free (0 chunks); 2576 used
unnamed prepared statement: 57344 total in 3 blocks; 31952 free (1 chunks); 25392 used
pg_shdepend_reference_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
pg_class_tblspc_relfilenode_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
pg_depend_depender_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
pg_depend_reference_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
pg_stat_statements: 15360 total in 4 blocks; 664 free (0 chunks); 14696 used
pg_settings: 23552 total in 5 blocks; 2040 free (0 chunks); 21512 used
pg_toast_2619_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
pg_stat_database: 15360 total in 4 blocks; 728 free (1 chunks); 14632 used
pg_stat_bgwriter: 7168 total in 3 blocks; 568 free (1 chunks); 6600 used
pg_toast_2618_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
pg_stat_activity: 23552 total in 5 blocks; 6464 free (0 chunks); 17088 used
pg_extension_name_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
EventTriggerCache: 8192 total in 1 blocks; 8152 free (2 chunks); 40 used
Event Trigger Cache: 8192 total in 1 blocks; 2840 free (0 chunks); 5352 used
pg_index_indrelid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_enum_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
pg_cast_source_target_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
pg_language_name_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_transform_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_collation_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_amop_fam_strat_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_ts_config_map_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_opclass_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_event_trigger_evtname_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_event_trigger_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_conversion_default_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_user_mapping_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_foreign_table_relid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_type_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_constraint_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_operator_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
pg_default_acl_role_nsp_obj_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_collation_name_enc_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_range_rngtypid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_class_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
pg_transform_type_lang_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
pg_proc_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_language_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_namespace_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
pg_foreign_server_name_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
pg_conversion_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_user_mapping_user_server_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_authid_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_tablespace_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_shseclabel_object_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_replication_origin_roname_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_database_datname_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_replication_origin_roiident_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
pg_database_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
pg_authid_rolname_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
WAL record construction: 49768 total in 2 blocks; 6584 free (0 chunks); 43184 used
PrivateRefCount: 8192 total in 1 blocks; 2840 free (0 chunks); 5352 used
MdSmgr: 8192 total in 1 blocks; 7512 free (1 chunks); 680 used
LOCALLOCK hash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
Timezones: 104120 total in 2 blocks; 2840 free (0 chunks); 101280 used
ErrorContext: 8192 total in 1 blocks; 8152 free (4 chunks); 40 used
Grand total: 7133648 bytes in 175 blocks; 2710008 free (99 chunks); 4423640 used
2018-05-22 11:33:03 UTC-5b0107ed.badd4-ERROR: out of memory
2018-05-22 11:33:03 UTC-5b0107ed.badd4-DETAIL: Failed on request of size 132.
Run Code Online (Sandbox Code Playgroud)
这张图显示了内存增长在过去一个月中是如何扩大并且从未下降的。
我们的应用程序的暂存安装在 AWS RDS 上,那里的内存使用量相当稳定 -正如您在这张图片中看到的那样,它在过去两周内基本保持不变。
我在 Azure 上可用的选项数量非常有限,无法控制这一点。除了将其扩展到更高的定价层并再次降低它之外,没有办法重新启动服务器。
不过,我宁愿避免这样做。我可以使用任何工具来弄清楚为什么会发生这种情况吗?是否有更好的选择来控制 Azure 中的内存?
谢谢你。
AWS RDS 在默认参数组中有多个选项,这些选项是根据 DBInstanceClassMemory 变量计算的,例如max_connections参数可能是{DBInstanceClassMemory/31457280},对于 8GB 服务器将计算为8*1024*1024*1024/31457280=273。
这允许参数在更改数据库实例 RAM 大小的同时按比例放大和缩小。
我不知道它在 Azure 上是如何配置的,但您还需要确保这些参数针对您的 RAM 进行了合理配置。首先,我会检查哪些参数依赖于 AWS RDS 上的 DBInstanceClassMemory,然后在 Azure 上将这些参数设置为相同(如果没有类似的机制来自动扩展它们,则手动计算)。
尝试设置它们,以便下面的查询返回的值小于 Postgres 的可用内存:
select
pg_size_pretty(
shared_buffers::bigint*block_size
+ max_connections*work_mem*1024
+ autovacuum_max_workers*(
case when autovacuum_work_mem=-1
then maintenance_work_mem
else autovacuum_work_mem
end
)*1024
) as estimated_max_ram_usage
from (
select
(select setting::bigint from pg_settings where name='block_size') as block_size,
(select setting::bigint from pg_settings where name='shared_buffers') as shared_buffers,
(select setting::bigint from pg_settings where name='max_connections') as max_connections,
(select setting::bigint from pg_settings where name='work_mem') as work_mem,
(select setting::bigint from pg_settings where name='autovacuum_max_workers') as autovacuum_max_workers,
(select setting::bigint from pg_settings where name='autovacuum_work_mem') as autovacuum_work_mem,
(select setting::bigint from pg_settings where name='maintenance_work_mem') as maintenance_work_mem
) as _
;
Run Code Online (Sandbox Code Playgroud)
但是你仍然可以杀死服务器,例如使用 pg_restore 或 reindex 来处理太多并行工作程序(它们最多可以使用maintenance_work_mem每个工作程序)或太复杂的查询(它们最多可以使用work_mem每个查询执行节点,并且可以有多个单个查询)。
服务器没有任何单一设置可以告诉它使用不超过 X RAM。
| 归档时间: |
|
| 查看次数: |
1626 次 |
| 最近记录: |