刷新物化视图对数据库的影响

Question

刷新物化视图对数据库的影响

Ima*_* Y. 8 postgresql materialized-view postgresql-9.6 amazon-rds

嗨，我们正在 Amazon RDS 中运行一个 PostgreSQL 9.6 数据库，使用 m4.large(2cpu 8gb) 和 1000 的预置 IOPS。用例如下：我们有一个包含数百万个注册表（或多或少 4M）的表和我们创建了一个物化视图，其中包含该表的一个子集 (2M aprox)，更改了一些列类型以提高查询效率。我们的 pg_conf 没有改变，是 RDS Postgres 的默认设置。

这是我们的视图定义：

CREATE MATERIALIZED VIEW public.customers_mv as
SELECT 
    id,
    gender,
    contact_info,
    location,
    social,
    categories,
    (social ->> 'follower_count')::integer AS social_follower_count,
    (social ->> 'following_count')::integer AS social_following_count,
    (social ->> 'peemv')::float AS social_emv,
    (social ->> 'engagement')::float AS social_engagement,
    (social ->> 'v')::boolean AS social_validated,
    search_vector,
    flags,
    to_tsvector('english',concat_ws(' ','aal0_'||(customers.location ->> 'aal0'),
      'aal1_'||(customers.location ->> 'aal1'),
      'aal2_'||(customers.location ->> 'aal2'),
      'frequent_location_aal0_'||(customers.location -> 'frequent_location' ->> 'aal0'),
      'frequent_location_aal1_'||(customers.location -> 'frequent_location' ->> 'aal1'),
      'frequent_location_aal2_'||(customers.location -> 'frequent_location' ->> 'aal2'),
      'last_post_location_aal0_'||(customers.location -> 'last_post_location' ->> 'aal0'),
      'last_post_location_aal1_'||(customers.location -> 'last_post_location' ->> 'aal1'),
      'last_post_location_aal2_'||(customers.location -> 'last_post_location' ->> 'aal2'),
      'admin_location_aal0_'||(customers.location -> 'admin_location' ->> 'aal0'),
      'bio_location_aal0_'||(customers.location -> 'bio_location' ->> 'aal0'))) as loc_vector
FROM public.customers
WHERE (customers.social -> 'follower_count') > '5000'
AND customers.social ? 'last_posts' 
AND (customers.flags IS NULL OR NOT customers.flags @> '{"destroy": true}'::jsonb);


CREATE INDEX customers_mv_followerc_idx ON customers_mv USING BTREE (social_follower_count);
CREATE INDEX customers_mv_folling_idx ON customers_mv USING BTREE (social_following_count);
CREATE INDEX customers_mv_emv_idx ON customers_mv USING BTREE (social_emv);
CREATE INDEX customers_mv_gin_social_idx ON customers_mv USING GIN (social jsonb_path_ops);
CREATE INDEX customers_mv_partial_social_validated_idx ON customers_mv (social_validated) WHERE social_validated = FALSE;
CREATE INDEX customers_mv_categories_idx ON customers_mv USING gin (categories);
CREATE INDEX customers_mv_gin_location_idx ON customers_mv USING GIN (location jsonb_path_ops);
CREATE INDEX customers_mv_gin_loc_vector ON customers_mv USING gin(loc_vector);
CREATE UNIQUE INDEX customers_mv_uniq_id_idx ON customers_mv (id);

Run Code Online (Sandbox Code Playgroud)

我们的视图有一些列仅用于访问数据，例如locationor social（都是 jsonb 类型），还有一些列loc_vector用于更快地查询。

现在的问题是： 当我们尝试刷新、并发刷新或创建新的物化视图时会出现问题，当我们尝试启动刷新命令时 CPU 或写入 IOPS 使数据库崩溃。

在这里我们可以看到它对 DB 的打击程度