Postgres 查询优化，高成本函数

Question

Postgres 查询优化，高成本函数

mic*_*imo 3 postgresql optimization postgresql-performance

目前我正在使用一个看起来像这样的 postgres 表 (postgres12)


create table if not exists asset (
  id text,
  symbol text not null,
  name text not null
  primary key (id)
);
create table if not exists latest_value (
  timestamp bigint,
  asset text,
  price decimal null,
  market_cap decimal null,
  primary key (asset),
  foreign key (asset)
    references asset (id)
    on delete cascade
);
create table if not exists value_aggregation (
  context aggregation_context,
  timestamp bigint,
  asset text,
  price jsonb null,
  market_cap jsonb null,
  primary key (context, timestamp, asset),
  foreign key (asset)
      references asset (id)
      on delete cascade
) partition by list (context);

create table if not exists value_aggregation_hour
partition of value_aggregation
for values in ('hour');

create index if not exists value_aggregation_timestamp_index
  on value_aggregation using brin(timestamp)
  with (autosummarize=true);

Run Code Online (Sandbox Code Playgroud)

该表value_aggregation_hour有大约 200 万行。该price列由具有打开、关闭、avg 等属性的 jsonb 组成

现在的问题：

以下查询花费的时间太长。

WITH base_table AS
  (SELECT asset, timestamp, market_cap, price
  FROM latest_value
  ORDER BY market_cap DESC
  LIMIT 50
  OFFSET 0)
SELECT asset.name, asset.symbol, asset.id, asset.market_data, asset.meta_data, timestamp, market_cap, price, spark.sparkline
FROM base_table LEFT JOIN (
  SELECT asset, array_agg(CAST(price->>'open' AS decimal) ORDER BY timestamp ASC) AS sparkline
  FROM value_aggregation
  WHERE context = 'hour'
  AND timestamp > extract(epoch from (now() - INTERVAL '7d'))
  AND asset IN (
    SELECT asset
    FROM base_table)
GROUP BY asset
) spark ON base_table.asset = spark.asset
INNER JOIN asset ON base_table.asset = asset.id;

Run Code Online (Sandbox Code Playgroud)

生成的查询计划如下所示：

Merge Left Join  (cost=234610.64..234774.05 rows=494 width=1740) (actual time=9173.660..9176.986 rows=50 loops=1)
  Merge Cond: (base_table.asset = value_aggregation_hour.asset)
  CTE base_table
    ->  Limit  (cost=140.48..140.61 rows=50 width=71) (actual time=2.040..2.051 rows=50 loops=1)
          ->  Sort  (cost=140.48..145.48 rows=2001 width=71) (actual time=2.039..2.043 rows=50 loops=1)
                Sort Key: latest_value.market_cap DESC
                Sort Method: top-N heapsort  Memory: 36kB
                ->  Seq Scan on latest_value  (cost=0.00..74.01 rows=2001 width=71) (actual time=0.011..0.536 rows=2001 loops=1)
  ->  Sort  (cost=377.41..377.54 rows=50 width=1740) (actual time=2.582..2.660 rows=50 loops=1)
        Sort Key: base_table.asset
        Sort Method: quicksort  Memory: 127kB
        ->  Nested Loop  (cost=0.28..376.00 rows=50 width=1740) (actual time=2.071..2.434 rows=50 loops=1)
              ->  CTE Scan on base_table  (cost=0.00..1.00 rows=50 width=232) (actual time=2.042..2.068 rows=50 loops=1)
              ->  Index Scan using asset_pkey on asset  (cost=0.28..7.50 rows=1 width=1508) (actual time=0.006..0.006 rows=1 loops=50)
                    Index Cond: (id = base_table.asset)
  ->  GroupAggregate  (cost=234092.62..234226.12 rows=1977 width=54) (actual time=9171.070..9174.268 rows=15 loops=1)
        Group Key: value_aggregation_hour.asset
        ->  Sort  (cost=234092.62..234110.75 rows=7253 width=203) (actual time=9167.909..9168.235 rows=2501 loops=1)
              Sort Key: value_aggregation_hour.asset
              Sort Method: quicksort  Memory: 761kB
              ->  Hash Semi Join  (cost=1.62..233627.54 rows=7253 width=203) (actual time=8985.832..9163.859 rows=2501 loops=1)
                    Hash Cond: (value_aggregation_hour.asset = base_table_1.asset)
                    ->  Seq Scan on value_aggregation_hour  (cost=0.00..232792.39 rows=286795 width=203) (actual time=8983.255..9112.164 rows=304163 loops=1)
                          Filter: ((\"timestamp\" > '1597855853329'::bigint) AND (context = 'hour'::aggregation_context))
                          Rows Removed by Filter: 2228311
                    ->  Hash  (cost=1.00..1.00 rows=50 width=32) (actual time=0.032..0.032 rows=50 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 11kB
                          ->  CTE Scan on base_table base_table_1  (cost=0.00..1.00 rows=50 width=32) (actual time=0.004..0.014 rows=50 loops=1)
Planning Time: 1.203 ms
Execution Time: 9177.185 ms

Run Code Online (Sandbox Code Playgroud)

我注意到查询计划器没有使用创建的索引，value_aggregation_hour并且想知道为什么。经过一番谷歌搜索后，我在调试期间禁用了 seqscan，再次执行查询explain analyze，然后得出以下查询计划：

Merge Left Join  (cost=10000237612.82..10000237776.37 rows=494 width=1740) (actual time=212.122..215.857 rows=50 loops=1)
  Merge Cond: (base_table.asset = value_aggregation_hour.asset)
  CTE base_table
    ->  Limit  (cost=10000000140.48..10000000140.61 rows=50 width=71) (actual time=1.745..1.756 rows=50 loops=1)
          ->  Sort  (cost=10000000140.48..10000000145.48 rows=2001 width=71) (actual time=1.744..1.748 rows=50 loops=1)
                Sort Key: latest_value.market_cap DESC
                Sort Method: top-N heapsort  Memory: 36kB
                ->  Seq Scan on latest_value  (cost=10000000000.00..10000000074.01 rows=2001 width=71) (actual time=0.006..0.555 rows=2001 loops=1)
  ->  Sort  (cost=377.41..377.54 rows=50 width=1740) (actual time=2.240..2.250 rows=50 loops=1)
        Sort Key: base_table.asset
        Sort Method: quicksort  Memory: 127kB
        ->  Nested Loop  (cost=0.28..376.00 rows=50 width=1740) (actual time=1.771..2.090 rows=50 loops=1)
              ->  CTE Scan on base_table  (cost=0.00..1.00 rows=50 width=232) (actual time=1.746..1.773 rows=50 loops=1)
              ->  Index Scan using asset_pkey on asset  (cost=0.28..7.50 rows=1 width=1508) (actual time=0.006..0.006 rows=1 loops=50)
                    Index Cond: (id = base_table.asset)
  ->  GroupAggregate  (cost=237094.80..237228.44 rows=1977 width=54) (actual time=209.877..213.542 rows=15 loops=1)
        Group Key: value_aggregation_hour.asset
        ->  Sort  (cost=237094.80..237112.96 rows=7262 width=203) (actual time=209.618..210.065 rows=2501 loops=1)
              Sort Key: value_aggregation_hour.asset
              Sort Method: quicksort  Memory: 761kB
              ->  Hash Semi Join  (cost=111.95..236629.08 rows=7262 width=203) (actual time=0.868..206.008 rows=2501 loops=1)
                    Hash Cond: (value_aggregation_hour.asset = base_table_1.asset)
                    ->  Bitmap Heap Scan on value_aggregation_hour  (cost=110.32..235792.92 rows=287144 width=203) (actual time=0.758..155.291 rows=304163 loops=1)
                          Recheck Cond: (\"timestamp\" > '1597855085099'::bigint)
                          Rows Removed by Index Recheck: 215
                          Filter: (context = 'hour'::aggregation_context)
                          Heap Blocks: lossy=23414
                          ->  Bitmap Index Scan on value_aggregation_hour_timestamp_idx  (cost=0.00..38.54 rows=287851 width=0) (actual time=0.698..0.698 rows=234240 loops=1)
                                Index Cond: (\"timestamp\" > '1597855085099'::bigint)
                    ->  Hash  (cost=1.00..1.00 rows=50 width=32) (actual time=0.025..0.025 rows=50 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 11kB
                          ->  CTE Scan on base_table base_table_1  (cost=0.00..1.00 rows=50 width=32) (actual time=0.001..0.007 rows=50 loops=1)
Planning Time: 1.532 ms
Execution Time: 216.114 ms

Run Code Online (Sandbox Code Playgroud)

最终成本相当高，但我认为这是因为没有索引latest_value，他需要使用 seqscan（关闭 = 超高成本？）。
但是现在他使用了的索引value_aggregation_hour并且速度更快了。
由于禁用 seqscan 不是除调试之外的有效选项，我怎样才能使其正常工作？我可以优化查询吗？也许改变 BRIN 的某些东西，所以他使用它而不是 seqscan？
或者参数调整是否更合适，因此成本函数的计算方式不同？我正在使用具有默认配置的 RDS postgres 实例 db.t3.small。

更新 #1：
删除AND asset IN (...)（冗余？）子查询会使执行时间增加一秒（seqscan on），这是生成的查询计划：

Merge Left Join  (cost=285605.54..289542.19 rows=494 width=1589) (actual time=10213.724..10561.884 rows=50 loops=1)"
  Merge Cond: (latest_value.asset = value_aggregation_hour.asset)"
  ->  Sort  (cost=517.65..517.77 rows=50 width=1579) (actual time=2.315..2.347 rows=50 loops=1)"
        Sort Key: latest_value.asset"
        Sort Method: quicksort  Memory: 127kB"
        ->  Nested Loop  (cost=140.89..516.24 rows=50 width=1579) (actual time=1.646..2.160 rows=50 loops=1)"
              ->  Limit  (cost=140.61..140.74 rows=50 width=71) (actual time=1.623..1.634 rows=50 loops=1)"
                    ->  Sort  (cost=140.61..145.62 rows=2004 width=71) (actual time=1.622..1.626 rows=50 loops=1)"
                          Sort Key: latest_value.market_cap DESC"
                          Sort Method: top-N heapsort  Memory: 36kB"
                          ->  Seq Scan on latest_value  (cost=0.00..74.04 rows=2004 width=71) (actual time=0.006..0.507 rows=2004 loops=1)"
              ->  Index Scan using asset_pkey on asset  (cost=0.28..7.50 rows=1 width=1508) (actual time=0.010..0.010 rows=1 loops=50)"
                    Index Cond: (id = latest_value.asset)"
  ->  GroupAggregate  (cost=285087.89..288994.63 rows=1977 width=54) (actual time=10196.939..10558.723 rows=1795 loops=1)"
        Group Key: value_aggregation_hour.asset"
        ->  Sort  (cost=285087.89..285734.90 rows=258802 width=203) (actual time=10196.652..10291.799 rows=295051 loops=1)"
              Sort Key: value_aggregation_hour.asset"
              Sort Method: external merge  Disk: 66000kB"
              ->  Seq Scan on value_aggregation_hour  (cost=0.00..236164.67 rows=258802 width=203) (actual time=8901.696..9056.748 rows=304558 loops=1)"
                    Filter: ((\"timestamp\" > '1597925634239'::bigint) AND (context = 'hour'::aggregation_context))"
                    Rows Removed by Filter: 2264599"
Planning Time: 1.149 ms"
Execution Time: 10573.183 ms"

Run Code Online (Sandbox Code Playgroud)

更新#2：

将查询更改为 a_horse_with_no_name left join 横向建议导致：

Nested Loop Left Join  (cost=141.45..576626.74 rows=6550 width=1589) (actual time=68.291..1313.768 rows=50 loops=1)
  ->  Nested Loop  (cost=140.89..516.24 rows=50 width=1579) (actual time=3.897..5.104 rows=50 loops=1)
        ->  Limit  (cost=140.61..140.74 rows=50 width=71) (actual time=3.855..3.931 rows=50 loops=1)
              ->  Sort  (cost=140.61..145.62 rows=2004 width=71) (actual time=3.853..3.900 rows=50 loops=1)
                    Sort Key: latest_value.market_cap DESC
                    Sort Method: top-N heapsort  Memory: 37kB
                    ->  Seq Scan on latest_value  (cost=0.00..74.04 rows=2004 width=71) (actual time=0.016..0.915 rows=2004 loops=1)
        ->  Index Scan using asset_pkey on asset  (cost=0.28..7.50 rows=1 width=1508) (actual time=0.017..0.017 rows=1 loops=50)
              Index Cond: (id = latest_value.asset)
  ->  GroupAggregate  (cost=0.56..11519.59 rows=131 width=54) (actual time=26.169..26.169 rows=0 loops=50)
        Group Key: value_aggregation_hour.asset
        ->  Index Scan using value_aggregation_hour_pkey on value_aggregation_hour  (cost=0.56..11516.32 rows=131 width=203) (actual time=18.780..26.105 rows=50 loops=50)
              Index Cond: ((context = 'hour'::aggregation_context) AND (\"timestamp\" > '1597926623087'::bigint) AND (asset = latest_value.asset))
Planning Time: 1.066 ms
Execution Time: 1320.452 ms

Run Code Online (Sandbox Code Playgroud)

大改进，会很好用。但这仍然不如在初始查询中使用 BRIN 索引。

Answer 1

Lau*_*lbe 5

PostgreSQL 估计顺序扫描value_aggregation_hour比索引扫描稍微便宜（233000 对 236000），而实际上它要便宜得多。

行数估计很好，所以问题很可能是PostgreSQL对你的机器有错误的想法。你可以尝试改进：

设置effective_cache_size为可用于缓存数据的内存量（shared_buffers+ 文件系统缓存）。

较高的值会降低估计值。索引扫描的成本。
设置random_page_cost为较低的值。如果随机访问与存储系统上的顺序访问一样快，请使用值 1。

较低的值会降低估计值。索引扫描的成本。

归档时间：	5 年，2 月前
查看次数：	181 次
最近记录：	5 年，2 月前