Postgres 9.5 外部表继承不使用索引

Dzm*_*sin 2 postgresql performance inheritance foreign-data postgresql-9.5 postgresql-performance

在 PostgreSQL 9.5.0 中,我有一个按月收集数据的分区表。尝试使用PostgreSQL新增的外表继承特性,将一个月的数据推送到另一台PostgreSQL服务器,结果得到了外表。当我从主服务器运行查询时,执行查询所需的时间比在我拥有外部表的服务器上长 7 倍。我没有通过网络传递大量数据,我的查询如下所示:

explain analyze
SELECT source, global_action, paid, organic, device, count(*) as count, sum(price) as sum
FROM "toys"
WHERE "toys"."container_id" = 857 AND (toys.created_at >= '2015-12-02 05:00:00.000000') AND
(toys.created_at <= '2015-12-30 04:59:59.999999') AND ("toys"."source" IS NOT NULL)
GROUP BY "toys"."source", "toys"."global_action", "toys"."paid", "toys"."organic", "toys"."device";

HashAggregate  (cost=1143634.94..1143649.10 rows=1133 width=15) (actual time=1556.894..1557.017 rows=372 loops=1)
   Group Key: toys.source, toys.global_action, toys.paid, toys.organic, toys.device
   ->  Append  (cost=0.00..1143585.38 rows=2832 width=15) (actual time=113.420..1507.373 rows=76593 loops=1)
         ->  Seq Scan on toys  (cost=0.00..0.00 rows=1 width=242) (actual time=0.001..0.001 rows=0 loops=1)
               Filter: ((source IS NOT NULL) AND (created_at >= '2015-12-02 05:00:00'::timestamp without time zone) AND (created_at <= '2015-12-30 04:59:59.999999'::timestamp without time zone) AND (container_id = 857))
         ->  Foreign Scan on toys_201512_new  (cost=100.00..1143585.38 rows=2831 width=15) (actual time=113.419..1488.445 rows=76593 loops=1)
 Planning time: 2.990 ms
 Execution time: 1560.131 ms
Run Code Online (Sandbox Code Playgroud)

PostgreSQL 是否在外部表上使用索引?(我在外部表中定义了索引。)如果我直接在该服务器上运行查询,则需要 200 毫秒。

这是父表定义:

Table "public.toys"
id bigint
job_reference character varying(100)
container_id integer
user_token character varying(1000)
user_ip character varying(100)
user_zip character varying(10)
user_agent character varying(2000)
url_referrer character varying(2000)
page_url character varying(2000)
source character varying(100)
action integer
created_at timestamp without time zone
cpa numeric(9,3) not null default 0.0
duplicate boolean not null default false
fingerprint character varying(255)
email character varying(1000)
mobile_email_apply boolean
country character varying(255)
country_matched boolean
device integer
organic boolean
job_seeker_id character varying(255)
applicant_status integer
ats_applicant_status character varying(255)
ats_applicant_source character varying(255)
price numeric(9,4)
job_group_id integer
analytic_source character varying(255)
global_action integer
paid_organic integer
paid boolean
meta text
params character varying(2000)
analytic_associated_click_id bigint
external_id character varying(100)
associated_click_id bigint
cpc numeric(9,3)
Indexes:
    "job_stats_master_pkey1" PRIMARY KEY, btree (id)
Run Code Online (Sandbox Code Playgroud)

子表有检查约束:

"toys_201512_new_created_at_check" CHECK (
    created_at >= '2015-11-30 19:00:00'::timestamp without time zone AND
    created_at <  '2015-12-31 19:00:00'::timestamp without time zone)
Inherits: toys
Run Code Online (Sandbox Code Playgroud)

和索引:

"toys_201512_new_analytic_source" btree (analytic_source)
"toys_201512_new_country" btree (country)
"toys_201512_new_created_at" btree (created_at)
"toys_201512_new_duplicate" btree (duplicate) WHERE duplicate = false
"toys_201512_new_container_id" btree (container_id)
"toys_201512_new_container_id_created_at" btree (container_id, created_at)
"toys_201512_new_fingerprint" btree (fingerprint)
"toys_201512_new_global_action" btree (global_action)
"toys_201512_new_id" btree (id)
"toys_201512_new_job_group_id" btree (job_group_id)
"toys_201512_new_job_reference" btree (job_reference)
"toys_201512_new_on_country_matched" btree (country_matched) WHERE country_matched = true
"toys_201512_new_on_cpa" btree (cpa) WHERE cpa <> 0::numeric
"toys_201512_new_on_duplicate_and_country_matched" btree (duplicate, country_matched) WHERE duplicate = false AND country_matched = true
"toys_201512_new_on_mobile_email_apply" btree (mobile_email_apply) WHERE mobile_email_apply = true
"toys_201512_new_source" btree (source)
"toys_201512_new_user_ip_user_agent" btree (user_ip, user_agent)
"toys_201512_new_user_token" btree (user_token)
Run Code Online (Sandbox Code Playgroud)

Erw*_*ter 5

Postgres 可以在外部服务器上使用索引。但是,与本地表相比,障碍要多得多。阅读手册中的远程查询优化一章。

当前源代码中的注释postgres_fdw.c也很有启发性:

521 * [...] 对于外国
522 * 表,我们不知道远程端存在哪些索引,但是
523 * 想推测一下如果存在,我们希望使用哪些。
...
675 * 这里需要小心一点。有当地人肯定会很好
676 * 关于远程索引定义信息的缓存...
...
722 * 对应于常规表的 SeqScan 路径(尽管取决于什么
723 * 我们能够发送到远程的基本限制条件,可能有
724 * 实际上是在那里发生的索引扫描)。  

指数

你的这个索引看起来不错:

"toys_201512_new_container_id_created_at" btree (container_id, created_at)
Run Code Online (Sandbox Code Playgroud)

如果您有许多NULL 值,您甚至可以通过附加 将其作为部分索引WHERE source IS NOT NULL,使 Postgres 查询规划器的索引看起来更好。

查询计划统计

确保查询规划器可以使用有效的统计信息EXPLAIN输出中的数字显示相当不匹配:

玩具上的外国扫描_201512_new(成本=100.00..1143585.38行=2831宽度=15)
                           (实际时间=113.419..1488.445行=76593循环=1)

实际返回的行数是 Postgres 预期的 27 倍。手册:

ANALYZE在外部表上运行是更新本地统计信息的方式;这将执行远程表的扫描,然后就像该表是本地表一样计算和存储统计信息。保留本地统计信息可能是减少远程表的每个查询计划开销的有用方法——但如果远程表经常更新,本地统计信息很快就会过时。

由于访问外部表可能很昂贵/微妙,因此这不会自动发生。autovacuum 不包括外部表。 手册:

只有在明确选择时才会分析外部表。

如果远程表变化很大,您可能需要激活use_remote_estimate. 手册:

此选项可以为外部表或外部服务器指定,控制是否postgres_fdw发出远程EXPLAIN命令以获得成本估算。外部表的设置会覆盖其服务器的任何设置,但仅限于该表。默认值为false.

最后,测试看看实际发送到外部服务器的内容:

可以使用 来检查实际发送到远程服务器以执行的查询EXPLAIN VERBOSE

询问

您的查询经过整理和格式化,并有一个小改进:

SELECT source, global_action, paid, organic, device
     , count(*) AS count, sum(price) AS sum
FROM   toys
WHERE  container_id = 857
AND    created_at >= '2015-12-02 05:00:00'
AND    created_at <  '2015-12-30 05:00:00'
AND    created_at <= '2015-12-30 04:59:59.999999'
AND    source IS NOT NULL
GROUP  BY source, global_action, paid, organic, device;
Run Code Online (Sandbox Code Playgroud)

更简单、更干净,也CHECK更好地匹配您的约束,并避免可能出现的极端情况问题。