我有一个DF,我有bookingDt和arrivalDt列.我需要找到这两个日期之间的所有日期.
示例代码:
df = spark.sparkContext.parallelize(
[Row(vyge_id=1000, bookingDt='2018-01-01', arrivalDt='2018-01-05')]).toDF()
diffDaysDF = df.withColumn("diffDays", datediff('arrivalDt', 'bookingDt'))
diffDaysDF.show()
Run Code Online (Sandbox Code Playgroud)
代码输出:
+----------+----------+-------+--------+
| arrivalDt| bookingDt|vyge_id|diffDays|
+----------+----------+-------+--------+
|2018-01-05|2018-01-01| 1000| 4|
+----------+----------+-------+--------+
Run Code Online (Sandbox Code Playgroud)
我尝试的是找到两个日期之间的天数,并使用timedelta函数计算所有日期explode.
dateList = [str(bookingDt + timedelta(i)) for i in range(diffDays)]
Run Code Online (Sandbox Code Playgroud)
预期产量:
基本上,我需要建立一个DF与对之间的每个日的记录bookingDt和arrivalDt,包容性.
+----------+----------+-------+----------+
| arrivalDt| bookingDt|vyge_id|txnDt |
+----------+----------+-------+----------+
|2018-01-05|2018-01-01| 1000|2018-01-01|
+----------+----------+-------+----------+
|2018-01-05|2018-01-01| 1000|2018-01-02|
+----------+----------+-------+----------+
|2018-01-05|2018-01-01| 1000|2018-01-03|
+----------+----------+-------+----------+
|2018-01-05|2018-01-01| 1000|2018-01-04|
+----------+----------+-------+----------+
|2018-01-05|2018-01-01| 1000|2018-01-05|
+----------+----------+-------+----------+
Run Code Online (Sandbox Code Playgroud)