小编Dea*_*ari的帖子

如何计算Spark中12个月内每个客户滑动1个月的订单总和

我是Scala的新手.目前我正在尝试在每月下滑的12个月期间汇总火花中的订单数据.

下面是我的数据的简单示例,我尝试对其进行格式化,以便您可以轻松地对其进行测试

import spark.implicits._
import org.apache.spark.sql._
import org.apache.spark.sql.functions._


var sample = Seq(("C1","01/01/2016", 20), ("C1","02/01/2016", 5), 
 ("C1","03/01/2016", 2),  ("C1","04/01/2016", 3), ("C1","05/01/2017", 5),
 ("C1","08/01/2017", 5), ("C1","01/02/2017", 10), ("C1","01/02/2017", 10),  
 ("C1","01/03/2017", 10)).toDF("id","order_date", "orders")

sample = sample.withColumn("order_date",
to_date(unix_timestamp($"order_date", "dd/MM/yyyy").cast("timestamp")))

sample.show 
Run Code Online (Sandbox Code Playgroud)
 +---+----------+------+
 | id|order_date|orders|
 +---+----------+------+
 | C1|2016-01-01|    20|
 | C1|2016-01-02|     5|
 | C1|2016-01-03|     2|
 | C1|2016-01-04|     3|
 | C1|2017-01-05|     5|
 | C1|2017-01-08|     5|
 | C1|2017-02-01|    10|
 | C1|2017-02-01|    10|
 | C1|2017-03-01|    10|
 +---+----------+------+
Run Code Online (Sandbox Code Playgroud)

强加给我的结果如下.

id      period_start    period_end  rolling
C1      2015-01-01      2016-01-01  30
C1      2016-01-01      2017-01-01 …
Run Code Online (Sandbox Code Playgroud)

scala aggregation apache-spark apache-spark-sql

4
推荐指数
1
解决办法
899
查看次数