转换spark数据帧中的日期模式

Rah*_*hvi 4 scala apache-spark spark-dataframe

我有一个String数据类型的火花数据框中的列(日期为yyyy-MM-dd模式)我想以MM/dd/yyyy模式显示列值

我的数据是

val df = sc.parallelize(Array(
  ("steak", "1990-01-01", "2000-01-01", 150),
  ("steak", "2000-01-02", "2001-01-13", 180),
  ("fish",  "1990-01-01", "2001-01-01", 100)
)).toDF("name", "startDate", "endDate", "price")

df.show()

+-----+----------+----------+-----+
| name| startDate|   endDate|price|
+-----+----------+----------+-----+
|steak|1990-01-01|2000-01-01|  150|
|steak|2000-01-02|2001-01-13|  180|
| fish|1990-01-01|2001-01-01|  100|
+-----+----------+----------+-----+

root
 |-- name: string (nullable = true)
 |-- startDate: string (nullable = true)
 |-- endDate: string (nullable = true)
 |-- price: integer (nullable = false)
Run Code Online (Sandbox Code Playgroud)

我想以MM/dd/yyyy模式显示endDate.我能做的就是将列从String转换为DateType

val df2 = df.select($"endDate".cast(DateType).alias("endDate"))

df2.show()

+----------+
|   endDate|
+----------+
|2000-01-01|
|2001-01-13|
|2001-01-01|
+----------+

df2.printSchema()

root
 |-- endDate: date (nullable = true)
Run Code Online (Sandbox Code Playgroud)

我想以MM/dd/yyyy模式显示endDate.仅供参考,我发现是这个不解决问题

San*_*ver 7

您可以使用date_format函数.

  import sqlContext.implicits._
  import org.apache.spark.sql.functions._

  val df = sc.parallelize(Array(
    ("steak", "1990-01-01", "2000-01-01", 150),
    ("steak", "2000-01-02", "2001-01-13", 180),
    ("fish", "1990-01-01", "2001-01-01", 100))).toDF("name", "startDate", "endDate", "price")

  df.show()

  df.select(date_format(col("endDate"), "MM/dd/yyyy")).show
Run Code Online (Sandbox Code Playgroud)

输出:

+-------------------------------+
|date_format(endDate,MM/dd/yyyy)|
+-------------------------------+
|                     01/01/2000|
|                     01/13/2001|
|                     01/01/2001|
+-------------------------------+
Run Code Online (Sandbox Code Playgroud)