Pyspark - 从日期和时间列创建时间戳

Sta*_*cks 1 python apache-spark pyspark

我在 PySpark 数据框中有一个Date和一个列。Hour如何将它们合并在一起以获得Desired_Calculated_Result列?

df1 = sqlContext.createDataFrame(
  [
     ('2021-10-20','1300', '2021-10-20 13:00:00.000+0000')
    ,('2021-10-20','1400', '2021-10-20 14:00:00.000+0000')
    ,('2021-10-20','1500', '2021-10-20 15:00:00.000+0000')
  ]
  ,['Date', 'Hour', 'Desired_Calculated_Result']
)
Run Code Online (Sandbox Code Playgroud)

我也尝试过:

df1.withColumn("TimeStamp", unix_timestamp(concat_ws(" ", df1.Date, df1.Hour), "yyyy-MM-dd HHmm").cast("timestamp")).show(). 
Run Code Online (Sandbox Code Playgroud)

这返回了时间戳列中的所有空值

Lui*_*ola 5

from pyspark.sql.functions import concat, unix_timestamp

df1\
  .withColumn("TimeStamp", unix_timestamp(concat(df1.Date, df1.Hour), "yyyy-MM-ddHHmm")\
  .cast("timestamp"))\
  .show()
Run Code Online (Sandbox Code Playgroud)