具有动态偏移量的 TSQL 复制 LAG() 函数

jmi*_*738 6 sql t-sql apache-spark-sql

我试图在LAG()不使用的情况下重新创建函数,LAG()但具有Offset依赖于列的动态。我要把这段代码复制到SparkSQL.

这是我的示例数据:

if object_id('tempdb.dbo.#myTable') is not null drop table #myTable
create table #myTable (id int,dates int, flag char, FromToFlagType varchar(2),FromToCounter INT)

insert into #myTable values(1,  '20181031','V','VV',1)
insert into #myTable values(2,  '20181130','V','VV',2)
insert into #myTable values(3,  '20181231','V','VV',3)
insert into #myTable values(4,  '20190131','F','VF',1)
insert into #myTable values(5,  '20190228','F','FF',2)
insert into #myTable values(6,  '20190331','F','FF',3)
insert into #myTable values(7,  '20190430','F','FF',4)
insert into #myTable values(8,  '20190531','V','FV',1)
insert into #myTable values(9,  '20190630','V','VV',2)
insert into #myTable values(10, '20190731','V','VV',3)

id  dates       flag    FromToFlagType  FromToCounter
1     20181031    V           VV                1
2     20181130    V           VV                2
3     20181231    V           VV                3
4     20190131    F           VF                1
5     20190228    F           FF                2
6     20190331    F           FF                3
7     20190430    F           FF                4
8     20190531    V           VF                1
9     20190630    V           VV                2
10  20190731      V           VV                3
Run Code Online (Sandbox Code Playgroud)

所以我想要做的是复制以下结果,但不使用动态Offset

select
  *
  ,LAG(FromToFlagType,FromToCounter-1) OVER ( ORDER BY dates) AS FromToStage
from 
  #mytable

id  dates     flag  FromToFlagType  FromToCounter   FromToStage
1     20181031  V             VV              1             VV
2     20181130  V             VV              2             VV
3     20181231  V             VV              3             VV
4     20190131  F             VF              1             VF
5     20190228  F             FF              2             VF
6     20190331  F             FF              3             VF
7     20190430  F             FF              4             VF
8     20190531  V             FV              1             FV
9     20190630  V             VV              2             FV
10  20190731    V             VV              3             FV
Run Code Online (Sandbox Code Playgroud)

我知道您可以LAG()使用CTEand进行复制JOIN,但它似乎只有在您提前知道偏移量将是什么时才有效。我在这里尝试过类似的东西,但我只是无法得到相同的结果。
我在这里找到了类似的东西,但我是 Spark 的新手,我需要一个使用SparkSQL. 我想,如果我可以复制该功能,我可以将其复制到“Spark”。

WITH FromToStage AS(
select 
    *
    ,id-(FromToCounter-1) AS id_2 
    from #mytable
   --order by dates
)
SELECT
  a.*
  ,b.FromToFlagType as FromToStage
FROM 
  FromToStage a
    JOIN
  FromToStage b
  ON
  a.id = b.id_2
order by dates
Run Code Online (Sandbox Code Playgroud)

Gor*_*off 1

根据你的逻辑,你想要first_value()

select t.*,
       first_value(FromToFlagType) over (partition by seqnum - seqnum_2 order by date)_ as first_FromToFlagType
from (select t.*,
             row_number() over order by date) as seqnum,
             row_number() over (partition by flag order by date) as seqnum_2
      from #mytable t
     ) t;
Run Code Online (Sandbox Code Playgroud)