我正在编写一个Spark结构化流程序。我需要创建一个带有滞后差的附加列。
为了重现我的问题,我提供了代码片段。此代码使用data.json存储在data文件夹中的文件:
[
{"id": 77,"type": "person","timestamp": 1532609003},
{"id": 77,"type": "person","timestamp": 1532609005},
{"id": 78,"type": "crane","timestamp": 1532609005}
]
Run Code Online (Sandbox Code Playgroud)
码:
[
{"id": 77,"type": "person","timestamp": 1532609003},
{"id": 77,"type": "person","timestamp": 1532609005},
{"id": 78,"type": "crane","timestamp": 1532609005}
]
Run Code Online (Sandbox Code Playgroud)
我收到此错误:
pyspark.sql.utils.AnalysisException:流数据帧/数据集不支持基于非时间的窗口;; \ nWindow [lag(timestamp#71L,1,null)windowspecdefinition(host_id#68,timestamp#71L ASC NULLS首先,第1行和第1行之间的行)为prev_timestamp#129L]
apache-spark apache-spark-sql pyspark spark-structured-streaming