Flo*_*ein 5 postgresql cte paging
在来这里打扰你之前,我试图尽可能多地记录我自己的话题,但无论如何我都在这里。
我们想在这个表上实现键集分页:
create table api.subscription (
subscription_id uuid primary key,
token_id uuid not null,
product_id uuid not null references api.product(product_id) deferrable,
spid bigint null,
attributes_snapshot jsonb not null,
created_at timestamp not null,
refreshed_at timestamp,
enriched_at timestamp null,
valid_until timestamp not null,
is_cancelled boolean not null,
has_been_expired boolean not null,
has_quality_data boolean not null
);
Run Code Online (Sandbox Code Playgroud)
为此,我们使用此查询来准备分页元数据:
with book as (
select created_at, subscription_id
from api.subscription
where token_id = $1
and refreshed_at >= $2
and valid_until >= now()
and not is_cancelled
and not has_been_expired
),
total as (
select count(*) as total from book
),
page as (
select * from book
where (case when $4 is null then true else (created_at > $4 or (created_at = $4 and subscription_id > $5)) end)
order by created_at asc, subscription_id asc
limit $3
),
last_row as (
select last_value(created_at) over() as last_seen_created_at,
last_value(subscription_id) over() as last_seen_subscription_id
from page
),
ids as (
select array_agg(subscription_id) ids from page
)
select * from last_row, total, ids
Run Code Online (Sandbox Code Playgroud)
这允许我们获取用户当前正在分页的项目总数、最后看到的键(用于下一页)和当前页面中的 ID(稍后使用IN子句(或any)获取它们,但这是另一回事)。
问题是当表包含 1900 万行时需要的时间:
Nested Loop (cost=50000.14..50000.22 rows=1 width=64) (actual time=143.677..143.677 rows=0 loops=1)
Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id, total.total, ids.ids
Buffers: shared hit=395 read=24611
CTE book
-> Seq Scan on api.subscription (cost=0.00..50000.00 rows=1 width=24) (actual time=143.646..143.646 rows=0 loops=1)
Output: subscription.created_at, subscription.subscription_id
Filter: ((NOT subscription.is_cancelled) AND (NOT subscription.has_been_expired) AND (subscription.token_id = '73759739-7af5-4d28-91bc-897c959bddcd'::uuid) AND (subscription.valid_until >= now()) AND (subscription.refreshed_at >= (now() - '4 days'::interval)))
Rows Removed by Filter: 1000000
Buffers: shared hit=389 read=24611
CTE total
-> Aggregate (cost=0.02..0.03 rows=1 width=8) (never executed)
Output: count(*)
-> CTE Scan on book (cost=0.00..0.02 rows=1 width=0) (never executed)
Output: book.created_at, book.subscription_id
CTE page
-> Limit (cost=0.03..0.04 rows=1 width=24) (actual time=143.674..143.674 rows=0 loops=1)
Output: book_1.created_at, book_1.subscription_id
Buffers: shared hit=395 read=24611
-> Sort (cost=0.03..0.04 rows=1 width=24) (actual time=143.673..143.673 rows=0 loops=1)
Output: book_1.created_at, book_1.subscription_id
Sort Key: book_1.created_at, book_1.subscription_id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=395 read=24611
-> CTE Scan on book book_1 (cost=0.00..0.02 rows=1 width=24) (actual time=143.646..143.646 rows=0 loops=1)
Output: book_1.created_at, book_1.subscription_id
Buffers: shared hit=389 read=24611
CTE last_row
-> WindowAgg (cost=0.00..0.03 rows=1 width=24) (actual time=143.675..143.675 rows=0 loops=1)
Output: last_value(page.created_at) OVER (?), last_value(page.subscription_id) OVER (?)
Buffers: shared hit=395 read=24611
-> CTE Scan on page (cost=0.00..0.02 rows=1 width=24) (actual time=143.674..143.674 rows=0 loops=1)
Output: page.created_at, page.subscription_id
Buffers: shared hit=395 read=24611
CTE ids
-> Aggregate (cost=0.02..0.03 rows=1 width=32) (never executed)
Output: array_agg(page_1.subscription_id)
-> CTE Scan on page page_1 (cost=0.00..0.02 rows=1 width=16) (never executed)
Output: page_1.created_at, page_1.subscription_id
-> Nested Loop (cost=0.00..0.05 rows=1 width=32) (actual time=143.677..143.677 rows=0 loops=1)
Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id, total.total
Buffers: shared hit=395 read=24611
-> CTE Scan on last_row (cost=0.00..0.02 rows=1 width=24) (actual time=143.676..143.676 rows=0 loops=1)
Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id
Buffers: shared hit=395 read=24611
-> CTE Scan on total (cost=0.00..0.02 rows=1 width=8) (never executed)
Output: total.total
-> CTE Scan on ids (cost=0.00..0.02 rows=1 width=32) (never executed)
Output: ids.ids
Planning time: 0.580 ms
Execution time: 143.786 ms
Run Code Online (Sandbox Code Playgroud)
注意:这是一个只有 1M 行的示例,但在 19M 行之后需要 20 多秒。
我们尝试了很多不同的索引设置,但没有运气:
Indexes:
"subscription_pkey" PRIMARY KEY, btree (subscription_id)
"has_been_expired_idx" btree (has_been_expired)
"is_cancelled_idx" btree (is_cancelled)
"pagination_idx" btree (token_id, refreshed_at, valid_until, is_cancelled, has_been_expired)
"refreshed_at_idx" btree (refreshed_at)
"token_and_created_at_idx" btree (token_id, created_at)
"token_and_refreshed_at_idx" btree (token_id, refreshed_at)
"token_id_idx" btree (token_id)
"valid_until_idx" btree (valid_until)
Foreign-key constraints:
"subscription_product_id_fkey" FOREIGN KEY (product_id) REFERENCES api.product(product_id) DEFERRABLE
Run Code Online (Sandbox Code Playgroud)
所以我的问题是:
欢迎任何提示:) 希望我的问题是有道理的。
感谢您抽出时间阅读!
没有总计,它可以重写为
with page as (
select created_at, subscription_id
from api.subscription
where token_id = $1
and refreshed_at >= $2
and valid_until >= now()
and not is_cancelled
and not has_been_expired
and (case when $4 is null then true else (created_at > $4 or (created_at = $4 and subscription_id > $5)) end)
order by created_at asc, subscription_id asc
limit $3
),
last_row as (
select last_value(created_at) over() as last_seen_created_at,
last_value(subscription_id) over() as last_seen_subscription_id
from page
),
ids as (valid_until
select array_agg(subscription_id) ids from page
)
select * from last_row, ids
Run Code Online (Sandbox Code Playgroud)
is_cancelled、has_been_expired、和 或之一token_id的
复合索引会有所帮助。refreshed_atvalid_untilcreated_at
| 归档时间: |
|
| 查看次数: |
638 次 |
| 最近记录: |