SQL - 自事件发生以来的衰减时间然后从下一个事件开始

Cra*_*aig 4 sql missing-data snowflake-cloud-data-platform

已经发布了许多类似的问题和答案,但我找不到具有这些差异的问题和答案。1) NULL 计数重新开始,2) 有一个数学函数应用于替换的值。

事件要么发生,要么不发生(NULL 或 1),按客户的日期。可以假设客户对于每个日期只有一行。

我想用基于连续 NULL 数量(事件时间)的衰减函数替换 NULL。客户可以每天参加活动、跳过一天、跳过多天。但一旦事件发生,衰退就会重新开始。目前我的衰减除以 2,但这只是举例。

DT 顾客 事件 期望的
2022-01-01 A 1 1
2022-01-02 A 1 1
2022-01-03 A 1 1
2022-01-04 A 1 1
2022-01-05 A 1 1
2022-01-01 1 1
2022-01-02 0.5
2022-01-03 0.25
2022-01-04 1 1
2022-01-05 0.5

我可以产生想要的结果,但它非常笨拙。看看有没有更好的办法。这需要针对多个事件列进行扩展。

create or replace temporary table the_data (
  dt date,
  customer char(10),
  event int,
  desired float)
;
insert into the_data values ('2022-01-01', 'a', 1, 1);
insert into the_data values ('2022-01-02', 'a', 1, 1);
insert into the_data values ('2022-01-03', 'a', 1, 1);
insert into the_data values ('2022-01-04', 'a', 1, 1);
insert into the_data values ('2022-01-05', 'a', 1, 1);

insert into the_data values ('2022-01-01', 'b', 1, 1);
insert into the_data values ('2022-01-02', 'b', NULL, 0.5);
insert into the_data values ('2022-01-03', 'b', NULL, 0.25);
insert into the_data values ('2022-01-04', 'b', 1, 1);
insert into the_data values ('2022-01-05', 'b', NULL, 0.5);

with
    base as (
      select * from the_data
    ),
    find_nan as (
      select *, case when event is null then 1 else 0 end as event_is_nan from base
    ),
    find_nan_diff as (
      select *, event_is_nan - coalesce(lag(event_is_nan) over (partition by customer order by dt), 0) as event_is_nan_diff from find_nan
    ),
    find_nan_group as (
      select *, sum(case when event_is_nan_diff = -1 then 1 else 0 end) over (partition by customer order by dt) as nan_group from find_nan_diff
    ),
    consec_nans as (
      select *, sum(event_is_nan) over (partition by customer, nan_group order by dt) as n_consec_nans from find_nan_group
    ),
    decay as (
      select *, case when n_consec_nans > 0 then 0.5 / n_consec_nans else 1 end as decay_factor from consec_nans
    ),
    ffill as (
      select *, first_value(event) over (partition by customer order by dt) as ffill_value from decay
    ),
    final as (
      select *, ffill_value * decay_factor as the_answer from ffill
    )
select * from final
order by customer, dt
;  
Run Code Online (Sandbox Code Playgroud)

谢谢

Luk*_*zda 6

可以通过使用CONDITIONAL_CHANGE_EVENT生成 subgrp 辅助列来简化查询:

WITH cte AS (
  SELECT *, CONDITIONAL_CHANGE_EVENT(event IS NULL) OVER(PARTITION BY CUSTOMER 
                                                         ORDER BY DT) AS subgrp
  FROM the_data
)
SELECT *, COALESCE(EVENT, 0.5 / ROW_NUMBER() OVER(PARTITION BY CUSTOMER, SUBGRP 
                                                  ORDER BY DT)) AS computed_decay
FROM cte
ORDER BY CUSTOMER, DT;
Run Code Online (Sandbox Code Playgroud)

输出:

在此输入图像描述


编辑:

不使用CONDITIONAL_CHANGE_EVENT

WITH cte AS (
  SELECT *, 
    CASE WHEN 
    event = LAG(event,1, event) OVER(PARTITION BY customer ORDER BY dt)
    OR (event IS NULL AND LAG(event) OVER(PARTITION BY customer ORDER BY dt) IS NULL)
    THEN 0 ELSE 1 END AS l
  FROM the_data

), cte2 AS (
  SELECT *, SUM(l) OVER(PARTITION BY customer ORDER BY dt) AS SUBGRP
  FROM cte
)
SELECT *, COALESCE(EVENT, 0.5 / ROW_NUMBER() OVER(PARTITION BY CUSTOMER, SUBGRP 
                                                  ORDER BY DT)) AS computed_decay
FROM cte2
ORDER BY CUSTOMER, DT;
Run Code Online (Sandbox Code Playgroud)

db<>小提琴演示

  • 有条件的改变事件。我不知道有这个功能。每天学些新东西。谢谢 (2认同)