小编hy2*_*015的帖子

pyspark.sql.functions.window 函数的“startTime”参数和 window.start 有什么作用？

示例如下：

df=spark.createDataFrame([
    (1,"2017-05-15 23:12:26",2.5),
    (1,"2017-05-09 15:26:58",3.5),
    (1,"2017-05-18 15:26:58",3.6),
    (2,"2017-05-15 15:24:25",4.8),
    (3,"2017-05-25 15:14:12",4.6)],["index","time","val"]).orderBy("index","time")
df.collect()

Run Code Online (Sandbox Code Playgroud)

+-----+-------------------+---+
|index|               time|val|
+-----+-------------------+---+
|    1|2017-05-09 15:26:58|3.5|
|    1|2017-05-15 23:12:26|2.5|
|    1|2017-05-18 15:26:58|3.6|
|    2|2017-05-15 15:24:25|4.8|
|    3|2017-05-25 15:14:12|4.6|
+-----+-------------------+---+

Run Code Online (Sandbox Code Playgroud)

对于函数“pyspark.sql.functions”

window(timeColumn, windowDuration, slideDuration=None, startTime=None)

timeColumn?The time column must be of TimestampType.

windowDuration?  Durations are provided as strings, e.g. '1 second', '1 day 12 hours', '2 minutes'. Valid
interval strings are 'week', 'day', 'hour', 'minute', 'second', 'millisecond', 'microsecond'.

slideDuration: If the 'slideDuration' is not provided, the windows …

Run Code Online (Sandbox Code Playgroud)

sql window dataframe apache-spark pyspark

hy2*_*015

2018 01-20

4
推荐指数

1
解决办法

1579
查看次数