MLh*_*ker -2 python apache-spark pyspark
假设我有以下日期时间列,如下所示。我想将字符串中的列转换为日期时间类型,以便提取月份,日期和年份等。
+---+------------+
|agg| datetime|
+---+------------+
| A|1/2/17 12:00|
| B| null|
| C|1/4/17 15:00|
+---+------------+
Run Code Online (Sandbox Code Playgroud)
我已经在下面尝试了以下代码,但是datetime列中的返回值为null,目前我不了解其原因。
df.select(df['datetime'].cast(DateType())).show()
Run Code Online (Sandbox Code Playgroud)
而且我也尝试了以下代码:
df = df.withColumn('datetime2', from_unixtime(unix_timestamp(df['datetime']), 'dd/MM/yy HH:mm'))
Run Code Online (Sandbox Code Playgroud)
但是,它们都产生以下数据帧:
+---+------------+---------+
|agg| datetime|datetime2|
+---+------------+---------+
| A|1/2/17 12:00| null|
| B| null | null|
| C|1/4/17 12:00| null|
Run Code Online (Sandbox Code Playgroud)
我已经阅读并尝试了本文中指定的解决方案,但无济于事:PySpark数据帧将异常的字符串格式转换为时间戳
小智 5
// imports
import org.apache.spark.sql.functions.{dayofmonth,from_unixtime,month, unix_timestamp, year}
// Not sure if the datatype of the column is datetime or string
// I assume the column might be string, do the conversion
// created column datetime2 which is time stamp
val df2 = df.withColumn("datetime2", from_unixtime(unix_timestamp(df("datetime"), "dd/MM/yy HH:mm")))
+---+------------+-------------------+
|agg| datetime| datetime2|
+---+------------+-------------------+
| A|1/2/17 12:00|2017-02-01 12:00:00|
| B| null| null|
| C|1/4/17 15:00|2017-04-01 15:00:00|
+---+------------+-------------------+
//extract month, year, day information
val df3 = df2.withColumn("month", month(df2("datetime2")))
.withColumn("year", year(df2("datetime2")))
.withColumn("day", dayofmonth(df2("datetime2")))
+---+------------+-------------------+-----+----+----+
|agg| datetime| datetime2|month|year| day|
+---+------------+-------------------+-----+----+----+
| A|1/2/17 12:00|2017-02-01 12:00:00| 2|2017| 1|
| B| null| null| null|null|null|
| C|1/4/17 15:00|2017-04-01 15:00:00| 4|2017| 1|
+---+------------+-------------------+-----+----+----+
Run Code Online (Sandbox Code Playgroud)
谢谢
| 归档时间: |
|
| 查看次数: |
7445 次 |
| 最近记录: |