Ais*_*afi 7 timestamp scala apache-spark apache-spark-sql
我有一个名为的数据帧train,他有以下架构:
root
|-- date_time: string (nullable = true)
|-- site_name: integer (nullable = true)
|-- posa_continent: integer (nullable = true)
Run Code Online (Sandbox Code Playgroud)
我想将date_time列转换为timestamp并使用year从date_time列中提取的值创建一个新列.
为了清楚起见,我有以下数据框:
+-------------------+---------+--------------+
| date_time|site_name|posa_continent|
+-------------------+---------+--------------+
|2014-08-11 07:46:59| 2| 3|
|2014-08-11 08:22:12| 2| 3|
|2015-08-11 08:24:33| 2| 3|
|2016-08-09 18:05:16| 2| 3|
|2011-08-09 18:08:18| 2| 3|
|2009-08-09 18:13:12| 2| 3|
|2014-07-16 09:42:23| 2| 3|
+-------------------+---------+--------------+
Run Code Online (Sandbox Code Playgroud)
我想获得以下数据帧:
+-------------------+---------+--------------+--------+
| date_time|site_name|posa_continent|year |
+-------------------+---------+--------------+--------+
|2014-08-11 07:46:59| 2| 3|2014 |
|2014-08-11 08:22:12| 2| 3|2014 |
|2015-08-11 08:24:33| 2| 3|2015 |
|2016-08-09 18:05:16| 2| 3|2016 |
|2011-08-09 18:08:18| 2| 3|2011 |
|2009-08-09 18:13:12| 2| 3|2009 |
|2014-07-16 09:42:23| 2| 3|2014 |
+-------------------+---------+--------------+--------+
Run Code Online (Sandbox Code Playgroud)
zer*_*323 12
好吧,如果你想将date_timecolumn转换为timestamp并使用year值创建一个新列,那么就这样做:
import org.apache.spark.sql.functions.year
df
.withColumn("date_time", $"date_time".cast("timestamp")) // cast to timestamp
.withColumn("year", year($"date_time")) // add year column
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
13446 次 |
| 最近记录: |