ATi*_*our 5 scala apache-spark apache-spark-sql
我正在尝试创建一个包含日、月等列的日历文件。以下代码工作正常,但我找不到一种干净的方法来提取一年中的星期(1-52)。在 中spark 3.0+,以下代码行不起作用:.withColumn("week_of_year", date_format(col("day_id"), "W"))
我知道我可以创建一个视图/表,然后对其运行 SQL 查询来提取week_of_year,但有没有更好的方法来做到这一点?`
df.withColumn("day_id", to_date(col("day_id"), date_fmt))
.withColumn("week_day", date_format(col("day_id"), "EEEE"))
.withColumn("month_of_year", date_format(col("day_id"), "M"))
.withColumn("year", date_format(col("day_id"), "y"))
.withColumn("day_of_month", date_format(col("day_id"), "d"))
.withColumn("quarter_of_year", date_format(col("day_id"), "Q"))
Run Code Online (Sandbox Code Playgroud)
Spark 3+ 似乎不再支持这些模式
Caused by: java.lang.IllegalArgumentException: All week-based patterns are unsupported since Spark 3.0, detected: w, Please use the SQL function EXTRACT instead
Run Code Online (Sandbox Code Playgroud)
你可以使用这个:
import org.apache.spark.sql.functions._
df.withColumn("week_of_year", weekofyear($"date"))
Run Code Online (Sandbox Code Playgroud)
测试
输入
val df = List("2021-05-15", "1985-10-05")
.toDF("date")
.withColumn("date", to_date($"date", "yyyy-MM-dd")
df.show
+----------+
| date|
+----------+
|2021-05-15|
|1985-10-05|
+----------+
Run Code Online (Sandbox Code Playgroud)
输出
df.withColumn("week_of_year", weekofyear($"date")).show
+----------+------------+
| date|week_of_year|
+----------+------------+
|2021-05-15| 19|
|1985-10-05| 40|
+----------+------------+
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8713 次 |
| 最近记录: |