av *_*iek 8 hive lag moving-average window-functions
我有这样的产品数据
Product Date Sales Availbility
xyz 2017-12-31 724.5 6.0
xyz 2018-01-07 362.25 7.0
xyz 2018-01-14 281.75 7.0
xyz 2018-01-21 442.75 7.0
xyz 2018-01-28 442.75 6.0
xyz 2018-02-04 402.5 7.0
xyz 2018-02-11 201.25 3.0
xyz 2018-02-18 120.75 0.0
xyz 2018-02-25 40.25 0.0
xyz 2018-03-11 201.25 0.0
xyz 2018-03-18 483.0 5.0
xyz 2018-03-25 322.0 7.0
xyz 2018-04-01 241.5 7.0
xyz 2018-04-08 281.75 7.0
xyz 2018-04-15 523.25 7.0
xyz 2018-04-22 241.5 7.0
xyz 2018-04-29 362.25 7.0
Run Code Online (Sandbox Code Playgroud)
数据没有订购(一个小问题),我想要做的是,在可用性栏(第4栏)中我们有0,我想采取前3周(有完全可用性,即7)平均
如下所示:
xyz 2017-12-31 724.5 6.0 Null
xyz 2018-01-07 362.25 7.0 362.25 ( Same value for weeks with availbility = 7)
xyz 2018-01-14 281.75 7.0 281.75
xyz 2018-01-21 442.75 7.0 442.75
xyz 2018-01-28 442.75 6.0 361 (362 + 281 + 362/3)the prior fully availble week avg which is avilble)
xyz 2018-02-04 402.5 7.0 402
xyz 2018-02-11 201.25 3.0 375 (402 + 442 + 281 /3)
xyz 2018-02-18 120.75 0.0 375 ( Same since 375 is the most recent 4 fully availble average)
xyz 2018-02-25 40.25 0.0 375
xyz 2018-03-11 201.25 0.0 375
xyz 2018-03-18 483.0 5.0 375
xyz 2018-03-25 322.0 7.0 322
xyz 2018-04-01 241.5 7.0 241
xyz 2018-04-08 281.75 7.0 281
xyz 2018-04-15 523.25 7.0 523
xyz 2018-04-22 241.5 7.0 241
xyz 2018-04-29 362.25 7.0 362
Run Code Online (Sandbox Code Playgroud)
我通过试图找到仅有完全可用周数的3周平均值并将其与剩余的周数联合起来接近它,后来尝试使用滞后功能来检索最近的平均值.
select a.*,lag(case when a.Full_availble_sales >0 then a.Full_availble_sales end,1) over (partition by a.asin order by a.week_beginning) as Four_wk_avg from (select asin,week_beginning,avg(sales) as weekly_sales,sum(available_to_purchase) as weekly_availbility,0 as Full_availble_sales from t1 where asin = 'xyz' group by asin,week_beginning having sum(available_to_purchase) < 7
union all
select t.asin,t.week_beginning,t.weekly_sales,t.weekly_availbility,avg(t.weekly_sales) over (partition by t.asin order by t.week_beginning rows between 3 preceding and current row ) as Full_availble_sales from
(select asin,week_beginning,avg(sales) as weekly_sales,sum(available_to_purchase) as weekly_availbility from t1 where asin = 'xyz' group by asin,week_beginning having sum(available_to_purchase) = 7)t ) a order by a.week_beginning
Run Code Online (Sandbox Code Playgroud)
O/P是
xyz 2017-12-31 724.5 6.0 0.0 NULL
xyz 2018-01-07 362.25 7.0 362.25 NULL
xyz 2018-01-14 281.75 7.0 322.0 362.25
xyz 2018-01-21 442.75 7.0 362.25 322.0
xyz 2018-01-28 442.75 6.0 0.0 362.25
xyz 2018-02-04 402.5 7.0 372.3125 NULL
xyz 2018-02-11 201.25 3.0 0.0 372.3125
xyz 2018-02-18 120.75 0.0 0.0 NULL
xyz 2018-02-25 40.25 0.0 0.0 NULL
xyz 2018-03-11 201.25 0.0 0.0 NULL
xyz 2018-03-18 483.0 5.0 0.0 NULL
xyz 2018-03-25 322.0 7.0 362.25 NULL
xyz 2018-04-01 241.5 7.0 352.1875 362.25
xyz 2018-04-08 281.75 7.0 311.9375 352.1875
xyz 2018-04-15 523.25 7.0 342.125 311.9375
xyz 2018-04-22 241.5 7.0 322.0 342.125
xyz 2018-04-29 362.25 7.0 352.1875 322.0
Run Code Online (Sandbox Code Playgroud)
这不是我的意图.
这将完成这项工作(使用聚合函数avg并max_by在窗口上)
WITH
tt1 (Product,Date_week_beginning,Sales,Availbility) AS
( SELECT * FROM ( VALUES
('xyz','2017-12-31', 724.5 ,6.0),
('xyz','2018-01-07', 362.25 ,7.0),
('xyz','2018-01-14', 281.75 ,7.0),
('xyz','2018-01-21', 442.75 ,7.0),
('xyz','2018-01-28', 442.75 ,6.0),
('xyz','2018-02-04', 402.5 ,7.0),
('xyz','2018-02-11', 201.25 ,3.0),
('xyz','2018-02-18', 120.75 ,0.0),
('xyz','2018-02-25', 40.25 ,0.0),
('xyz','2018-03-11', 201.25 ,0.0),
('xyz','2018-03-18', 483.0 ,5.0),
('xyz','2018-03-25', 322.0 ,7.0),
('xyz','2018-04-01', 241.5 ,7.0),
('xyz','2018-04-08', 281.75 ,7.0),
('xyz','2018-04-15', 523.25 ,7.0),
('xyz','2018-04-22', 241.5 ,7.0),
('xyz','2018-04-29', 362.25 ,7.0) )
), tt2 AS (
SELECT *, avg(sales) OVER (partition by Product order by if(Availbility = 7.0,1),Date_week_beginning ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) avg3
FROM tt1
)
SELECT Product,Date_week_beginning,Sales,Availbility,
CASE WHEN Availbility = 7.0 THEN Sales
ELSE
max_by(if(Availbility = 7.0,avg3),(Availbility = 7.0, Date_week_beginning) ) OVER (partition by Product order by Date_week_beginning)
END new_col
FROM tt2
ORDER BY Product,Date_week_beginning
Run Code Online (Sandbox Code Playgroud)
结果与要求完全相同:
Product Date_week_beginning Sales Availbility new_col
xyz 2017-12-31 724.5 6.0 NULL
xyz 2018-01-07 362.25 7.0 362.25
xyz 2018-01-14 281.75 7.0 281.75
xyz 2018-01-21 442.75 7.0 442.75
xyz 2018-01-28 442.75 6.0 362.25
xyz 2018-02-04 402.5 7.0 402.5
xyz 2018-02-11 201.25 3.0 375.6666666666667
xyz 2018-02-18 120.75 0.0 375.6666666666667
xyz 2018-02-25 40.25 0.0 375.6666666666667
xyz 2018-03-11 201.25 0.0 375.6666666666667
xyz 2018-03-18 483.0 5.0 375.6666666666667
xyz 2018-03-25 322.0 7.0 322.0
xyz 2018-04-01 241.5 7.0 241.5
xyz 2018-04-08 281.75 7.0 281.75
xyz 2018-04-15 523.25 7.0 523.25
xyz 2018-04-22 241.5 7.0 241.5
xyz 2018-04-29 362.25 7.0 362.25
Run Code Online (Sandbox Code Playgroud)
我假设问题中有两个拼写错误(可以从示例中验证):
问题中的这一行是错误的:
361 (362 + 281 + 362/3)the prior fully availble week avg which is avilble)应该是361 (442 + 281 + 362/3)the prior fully availble week avg which is avilble)句子:“我想做的是,只要可用性列(第 4 列)中有 0,..”应该是“我想做的是,只要可用性列(第 4 列)没有7.0 ” ..”
| 归档时间: |
|
| 查看次数: |
120 次 |
| 最近记录: |