给定一个数据帧 df
id | date
---------------
1 | 2015-09-01
2 | 2015-09-01
1 | 2015-09-03
1 | 2015-09-04
2 | 2015-09-04
Run Code Online (Sandbox Code Playgroud)
我想创建一个运行计数器或索引,
从而
id | date | counter
--------------------------
1 | 2015-09-01 | 1
1 | 2015-09-03 | 2
1 | 2015-09-04 | 3
2 | 2015-09-01 | 1
2 | 2015-09-04 | 2
Run Code Online (Sandbox Code Playgroud)
这是我可以通过窗口功能实现的,例如
val w = Window.partitionBy("id").orderBy("date")
val resultDF = df.select( df("id"), rowNumber().over(w) )
Run Code Online (Sandbox Code Playgroud)
不幸的是,Spark 1.4.1不支持常规数据帧的窗口函数:
org.apache.spark.sql.AnalysisException: Could not resolve window function 'row_number'. Note that, …Run Code Online (Sandbox Code Playgroud)