示例如下:
df=spark.createDataFrame([
(1,"2017-05-15 23:12:26",2.5),
(1,"2017-05-09 15:26:58",3.5),
(1,"2017-05-18 15:26:58",3.6),
(2,"2017-05-15 15:24:25",4.8),
(3,"2017-05-25 15:14:12",4.6)],["index","time","val"]).orderBy("index","time")
df.collect()
Run Code Online (Sandbox Code Playgroud)
+-----+-------------------+---+
|index| time|val|
+-----+-------------------+---+
| 1|2017-05-09 15:26:58|3.5|
| 1|2017-05-15 23:12:26|2.5|
| 1|2017-05-18 15:26:58|3.6|
| 2|2017-05-15 15:24:25|4.8|
| 3|2017-05-25 15:14:12|4.6|
+-----+-------------------+---+
Run Code Online (Sandbox Code Playgroud)
对于函数“pyspark.sql.functions”
window(timeColumn, windowDuration, slideDuration=None, startTime=None)
timeColumn?The time column must be of TimestampType.
windowDuration? Durations are provided as strings, e.g. '1 second', '1 day 12 hours', '2 minutes'. Valid
interval strings are 'week', 'day', 'hour', 'minute', 'second', 'millisecond', 'microsecond'.
slideDuration: If the 'slideDuration' is not provided, the windows …Run Code Online (Sandbox Code Playgroud)