Azure Postgresql 达到 100% 内存使用率,无需简单的重启服务器的方法

Jos*_*aia 5 postgresql azure

我最近在使用 Azure Database for PostgreSQL 时注意到一个问题,我的内存使用量不断增长,达到 100% 时,服务器将停止响应。

这个数据库服务器专门用于开发,所以它有很多短期连接,经常被强行关闭(因为人们重新启动他们的应用程序来修复这里或那里的错误。)

在日志中,在发生这种情况之前,我可以看到两种模式,即自动清理错误:

2018-05-22 11:16:13 UTC-5ae5085b.20-LOG:  CreateProcess call failed: No error (error code 1455)
2018-05-22 11:16:13 UTC-5ae5085b.20-LOG:  could not fork autovacuum worker process: No error
2018-05-22 11:16:14 UTC-5ae5085b.20-LOG:  CreateProcess call failed: A blocking operation was interrupted by a call to WSACancelBlockingCall.

 (error code 1455)
Run Code Online (Sandbox Code Playgroud)

似乎是内存使用转储,然后是请求失败的警告:

TopMemoryContext: 143584 total in 6 blocks; 68072 free (43 chunks); 75512 used
  TopTransactionContext: 8192 total in 1 blocks; 7960 free (0 chunks); 232 used
  CFuncHash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
  Type information cache: 24472 total in 2 blocks; 2840 free (0 chunks); 21632 used
  Record information cache: 24576 total in 2 blocks; 15072 free (5 chunks); 9504 used
  Operator lookup cache: 24576 total in 2 blocks; 10976 free (5 chunks); 13600 used
  TableSpace cache: 8192 total in 1 blocks; 2312 free (0 chunks); 5880 used
  MessageContext: 8192 total in 1 blocks; 6968 free (0 chunks); 1224 used
  Operator class cache: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
  smgr relation table: 24576 total in 2 blocks; 8896 free (4 chunks); 15680 used
  TransactionAbortContext: 32768 total in 1 blocks; 32728 free (0 chunks); 40 used
  Portal hash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
  PortalMemory: 8192 total in 1 blocks; 7880 free (0 chunks); 312 used
    PortalHeapMemory: 1024 total in 1 blocks; 672 free (0 chunks); 352 used
      ExecutorState: 4218936 total in 10 blocks; 1970640 free (11 chunks); 2248296 used
        printtup: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        Table function arguments: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 1040384 total in 7 blocks; 168 free (3 chunks); 1040216 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 24576 total in 2 blocks; 12960 free (4 chunks); 11616 used
  CacheMemoryContext: 1040384 total in 7 blocks; 422592 free (14 chunks); 617792 used
    CachedPlan: 31744 total in 5 blocks; 13368 free (0 chunks); 18376 used
    CachedPlanSource: 3072 total in 2 blocks; 496 free (0 chunks); 2576 used
      unnamed prepared statement: 57344 total in 3 blocks; 31952 free (1 chunks); 25392 used
    pg_shdepend_reference_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_class_tblspc_relfilenode_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_depend_depender_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
    pg_depend_reference_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
    pg_stat_statements: 15360 total in 4 blocks; 664 free (0 chunks); 14696 used
    pg_settings: 23552 total in 5 blocks; 2040 free (0 chunks); 21512 used
    pg_toast_2619_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_stat_database: 15360 total in 4 blocks; 728 free (1 chunks); 14632 used
    pg_stat_bgwriter: 7168 total in 3 blocks; 568 free (1 chunks); 6600 used
    pg_toast_2618_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_stat_activity: 23552 total in 5 blocks; 6464 free (0 chunks); 17088 used
    pg_extension_name_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    EventTriggerCache: 8192 total in 1 blocks; 8152 free (2 chunks); 40 used
      Event Trigger Cache: 8192 total in 1 blocks; 2840 free (0 chunks); 5352 used
    pg_index_indrelid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
    pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_enum_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_class_relname_nsp_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_foreign_server_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
    pg_cast_source_target_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_language_name_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_transform_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_collation_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_amop_fam_strat_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
    pg_index_indexrelid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_ts_template_tmplname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_ts_config_map_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_opclass_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_event_trigger_evtname_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_ts_dict_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_event_trigger_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_conversion_default_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
    pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_enum_typid_label_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_ts_config_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_user_mapping_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_foreign_table_relid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_type_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_aggregate_fnoid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_constraint_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_ts_parser_prsname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_ts_config_cfgname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_ts_parser_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_operator_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_namespace_nspname_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_ts_template_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_amop_opr_fam_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
    pg_default_acl_role_nsp_obj_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_collation_name_enc_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_range_rngtypid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_ts_dict_dictname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_type_typname_nsp_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_opfamily_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_class_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
    pg_transform_type_lang_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_proc_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_language_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_namespace_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
    pg_foreign_server_name_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_conversion_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_user_mapping_user_server_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_conversion_name_nsp_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_authid_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_auth_members_member_role_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_tablespace_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_shseclabel_object_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_replication_origin_roname_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_database_datname_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_replication_origin_roiident_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_auth_members_role_member_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_database_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_authid_rolname_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
  WAL record construction: 49768 total in 2 blocks; 6584 free (0 chunks); 43184 used
  PrivateRefCount: 8192 total in 1 blocks; 2840 free (0 chunks); 5352 used
  MdSmgr: 8192 total in 1 blocks; 7512 free (1 chunks); 680 used
  LOCALLOCK hash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
  Timezones: 104120 total in 2 blocks; 2840 free (0 chunks); 101280 used
  ErrorContext: 8192 total in 1 blocks; 8152 free (4 chunks); 40 used
Grand total: 7133648 bytes in 175 blocks; 2710008 free (99 chunks); 4423640 used
2018-05-22 11:33:03 UTC-5b0107ed.badd4-ERROR:  out of memory
2018-05-22 11:33:03 UTC-5b0107ed.badd4-DETAIL:  Failed on request of size 132.
Run Code Online (Sandbox Code Playgroud)

这张图显示了内存增长在过去一个月中是如何扩大并且从未下降的。

我们的应用程序的暂存安装在 AWS RDS 上,那里的内存使用量相当稳定 -正如您在这张图片中看到的那样,它在过去两周内基本保持不变。

我在 Azure 上可用的选项数量非常有限,无法控制这一点。除了将其扩展到更高的定价层并再次降低它之外,没有办法重新启动服务器。

不过,我宁愿避免这样做。我可以使用任何工具来弄清楚为什么会发生这种情况吗?是否有更好的选择来控制 Azure 中的内存?

谢谢你。

Tom*_*zky 3

AWS RDS 在默认参数组中有多个选项,这些选项是根据 DBInstanceClassMemory 变量计算的,例如max_connections参数可能是{DBInstanceClassMemory/31457280},对于 8GB 服务器将计算为8*1024*1024*1024/31457280=273

这允许参数在更改数据库实例 RAM 大小的同时按比例放大和缩小。

我不知道它在 Azure 上是如何配置的,但您还需要确保这些参数针对您的 RAM 进行了合理配置。首先,我会检查哪些参数依赖于 AWS RDS 上的 DBInstanceClassMemory,然后在 Azure 上将这些参数设置为相同(如果没有类似的机制来自动扩展它们,则手动计算)。

尝试设置它们,以便下面的查询返回的值小于 Postgres 的可用内存:

select
    pg_size_pretty(
        shared_buffers::bigint*block_size
        + max_connections*work_mem*1024
        + autovacuum_max_workers*(
            case when autovacuum_work_mem=-1
            then maintenance_work_mem
            else autovacuum_work_mem
            end
        )*1024
    ) as estimated_max_ram_usage
from (
    select
    (select setting::bigint from pg_settings where name='block_size') as block_size,
    (select setting::bigint from pg_settings where name='shared_buffers') as shared_buffers,
    (select setting::bigint from pg_settings where name='max_connections') as max_connections,
    (select setting::bigint from pg_settings where name='work_mem') as work_mem,
    (select setting::bigint from pg_settings where name='autovacuum_max_workers') as autovacuum_max_workers,
    (select setting::bigint from pg_settings where name='autovacuum_work_mem') as autovacuum_work_mem,
    (select setting::bigint from pg_settings where name='maintenance_work_mem') as maintenance_work_mem
) as _
;
Run Code Online (Sandbox Code Playgroud)

但是你仍然可以杀死服务器,例如使用 pg_restore 或 reindex 来处理太多并行工作程序(它们最多可以使用maintenance_work_mem每个工作程序)或太复杂的查询(它们最多可以使用work_mem每个查询执行节点,并且可以有多个单个查询)。

服务器没有任何单一设置可以告诉它使用不超过 X RAM