Sha*_*kar 24 apache-spark apache-spark-sql
我有一个DataFrame
与Timestamp
列,我需要为转换Date
格式.
是否有可用的Spark SQL函数?
Dan*_*ula 51
您可以cast
列到目前为止:
斯卡拉:
import org.apache.spark.sql.types.DateType
val newDF = df.withColumn("dateColumn", df("timestampColumn").cast(DateType))
Run Code Online (Sandbox Code Playgroud)
Pyspark:
df = df.withColumn('dateColumn', df['timestampColumn'].cast('date'))
Run Code Online (Sandbox Code Playgroud)
dsl*_*ack 14
在SparkSQL中:
SELECT
CAST(the_ts AS DATE) AS the_date
FROM the_table
Run Code Online (Sandbox Code Playgroud)
想象一下以下输入:
val dataIn = spark.createDataFrame(Seq(
(1, "some data"),
(2, "more data")))
.toDF("id", "stuff")
.withColumn("ts", current_timestamp())
dataIn.printSchema
root
|-- id: integer (nullable = false)
|-- stuff: string (nullable = true)
|-- ts: timestamp (nullable = false)
Run Code Online (Sandbox Code Playgroud)
您可以使用to_date函数:
val dataOut = dataIn.withColumn("date", to_date($"ts"))
dataOut.printSchema
root
|-- id: integer (nullable = false)
|-- stuff: string (nullable = true)
|-- ts: timestamp (nullable = false)
|-- date: date (nullable = false)
dataOut.show(false)
+---+---------+-----------------------+----------+
|id |stuff |ts |date |
+---+---------+-----------------------+----------+
|1 |some data|2017-11-21 16:37:15.828|2017-11-21|
|2 |more data|2017-11-21 16:37:15.828|2017-11-21|
+---+---------+-----------------------+----------+
Run Code Online (Sandbox Code Playgroud)
我建议优先使用这些方法而不是强制转换和普通 SQL。
归档时间: |
|
查看次数: |
40653 次 |
最近记录: |