寻找更简单的递归查询替代方案

sr3*_*r33 7 postgresql performance cte recursive postgresql-performance

实际查询更多,但我面临的问题可以归结为:

用于过滤单调递增整数行集的查询,以便 -在最终结果集中, row(n+1).value >= row(n).value + 5

对于我需要解决的实际问题,行集计数在 1000 秒内。

举几个例子来澄清:

  • 如果行是: 1,2,3,4,5 :那么查询应该返回:1
  • 如果行是: 1,5,7,10,11,12,13 :那么查询应该返回:1,7,12
  • 如果行是:6,8,11,16,20,23:那么查询应该返回:6,11,16,23
  • 如果行是:6,8,12,16,20,23:那么查询应该返回:6,12,20

我设法通过以下查询获得了所需的结果,但它似乎过于复杂。取消注释不同的“..with t(k)..”以尝试它们。

我正在寻找任何简化或替代方法来获得相同的结果。

with recursive r(n, pri) as (
    with t(k) as (values (1),(2),(3),(4),(5))   -- the data we want to filter
    -- with t(k) as (values (1),(5),(7),(10),(11),(12),(13))
    -- with t(k) as (values (6),(8),(11),(16),(20),(23))
    -- with t(k) as (values (6),(8),(12),(16),(20),(23))
    select min(k), 1::bigint from t             -- bootstrap for recursive processing. 1 here represents rank().
    UNION
    select k, (rank() over(order by k)) rr      -- rank() is required just to filter out the rows we dont want from the final result set, and no other reason
    from r, t 
    where t.k >= r.n+5 and r.pri = 1            -- capture the rows we want, AND unfortunately a bunch of rows we dont want 
)
select n from r where pri = 1; 
Run Code Online (Sandbox Code Playgroud)

Eze*_*nay 2

这很难!我不知道这是否更简单,但至少它不使用窗口函数,也不产生需要过滤掉的行。

with recursive r(k, n) as (
    with t(k) as (values (1),(2),(3),(4),(5))   -- the data we want to filter
    -- with t(k) as (values (1),(5),(7),(10),(11),(12),(13))
    -- with t(k) as (values (6),(8),(11),(16),(20),(23))
    -- with t(k) as (values (6),(8),(12),(16),(20),(23))
         ,t2(k,n) AS (select k, (select min(k) from t tt where k >= t.k+5) from t) -- precalculate what's next
    select * from (select * from t2 limit 1) x   -- limit 1 directly fails in a union!
    UNION ALL
    select t2.* from r, t2 where t2.k = r.n      -- on each iteration, keep only the value that matches the previous precalculated next one
)
select k from r
Run Code Online (Sandbox Code Playgroud)

测试

对于非常小的集合,这种替代方案似乎效率较低,但性能或多或少呈线性,而原始方案似乎呈指数级缓慢。

drop table if exists t;
create temp table t(k) AS
with recursive r(n) as (
  select floor(random()*10)::int + 1
  UNION ALL
  select n + floor(random()*10)::int + 1
  from r
  where n < 100000)        -- change to increase or reduce set
select * from r;           -- surprisingly fast! Go PG!
create index on t(k);

with recursive r(n, pri) as (
    select min(k), 1::bigint from t
    UNION
    select k, (rank() over(order by k)) rr
    from r, t 
    where t.k >= r.n+5 and r.pri = 1
)
select count(*) from r where pri = 1; -- I aborted it after waiting for a minute

with recursive r(k, n) as (
    with t2(k,n) AS (select k, (select min(k) from t tt where k >= t.k+5) from t)
    select * from (select * from t2 limit 1) x
    UNION ALL
    select t2.* from r, t2 where t2.k = r.n
)
select count(*) from r -- 26" in my server
Run Code Online (Sandbox Code Playgroud)