Pyspark:将列从字符串类型转换为时间戳类型

Ahm*_*man 3 casting unix-timestamp pyspark

我一直在使用pyspark 2.3。我的数据框包含日期时间值的字符串格式的“TIME”列。该列如下所示:

+---------------+
|           TIME|
+---------------+
| 2016/04/14 190|
| 2016/04/15 180|
|2016/04/14 1530|
|2016/04/16 1530|
| 2016/04/17 160|
+---------------+
Run Code Online (Sandbox Code Playgroud)

其中前两位数字 190代表1530 小时,其余数字代表分钟。我尝试使用以下行将其转换为时间戳类型:

df.withColumn('TIME_timestamp',fn.unix_timestamp('TIME','yyyy/MM/dd HHMM').cast(TimestampType()))
Run Code Online (Sandbox Code Playgroud)

并且 :

df.withColumn('TIME_timestamp', fn.to_timestamp("TIME", 'yyyy/MM/dd HHMM'))
Run Code Online (Sandbox Code Playgroud)

但结果是:

+---------------+-------------------+
|           TIME|     TIME_timestamp|
+---------------+-------------------+
| 2016/04/14 190|               null|
| 2016/04/15 180|               null|
|2016/04/14 1530|               null|
|2016/04/16 1530|               null|
| 2016/04/17 160|               null|
+---------------+-------------------+
Run Code Online (Sandbox Code Playgroud)

所以所需的数据框应该如下所示:

+---------------+
| TIME_timestamp|
+---------------+
| 16-04-15 19:00|
| 16-04-15 18:00|
| 16-04-15 15:30|
| 16-04-15 15:30|
| 16-04-15 16:00|
+---------------+
Run Code Online (Sandbox Code Playgroud)

Flo*_*ian 5

您使用大写字母M来识别月份和分钟;会议记录应确定m,请参见此处to_timestamp下面给出了一个使用的示例,希望对您有所帮助!

import pyspark.sql.functions as F

df = sqlContext.createDataFrame(
    [
     ('2016/04/14 190',),
     ('2016/04/15 180',),
     ('2016/04/14 1530',),
     ('2016/04/16 1530',),
     ('2016/04/17 160',)
    ],
    ("TIME",)
)

df.withColumn('TIME_timestamp',F.to_timestamp("TIME", "yyyy/MM/dd HHmm")).show()
Run Code Online (Sandbox Code Playgroud)

输出:

+---------------+-------------------+
|           TIME|     TIME_timestamp|
+---------------+-------------------+
| 2016/04/14 190|2016-04-14 19:00:00|
| 2016/04/15 180|2016-04-15 18:00:00|
|2016/04/14 1530|2016-04-14 15:30:00|
|2016/04/16 1530|2016-04-16 15:30:00|
| 2016/04/17 160|2016-04-17 16:00:00|
+---------------+-------------------+
Run Code Online (Sandbox Code Playgroud)