vis*_*raj 5 apache-spark apache-spark-sql
Input DF:
+-------------------+---------+
|VALUES |Delimiter|
+-------------------+---------+
|50000.0#0#0# |# |
|0@1000.0@ |@ |
|1$ |$ |
|1000.00^Test_string|^ |
+-------------------+---------+
Expected Output DF:
+-------------------+---------+----------------------+
|VALUES |Delimiter|SPLITED_VALUES |
+-------------------+---------+----------------------+
|50000.0#0#0# |# |[50000.0, 0, 0] |
|0@1000.0@ |@ |[0, 1000.0] |
|1$ |$ |[1] |
|1000.00^Test_string|^ |[1000.00, Test_string]|
+-------------------+---------+----------------------+
Run Code Online (Sandbox Code Playgroud)
代码:
import sparkSession.sqlContext.implicits._
val dept = Seq(("50000.0#0#0#", "#"),("0@1000.0@", "@"),("1$", "$"),("1000.00^Test_string", "^")).toDF("VALUES", "Delimiter")
Run Code Online (Sandbox Code Playgroud)
我对 Spark 非常陌生,尝试使用另一列中的分隔符来拆分“VALUES”列的值。
尝试使用 Spark split 函数作为
val dept2 = dept.withColumn("SPLITED_VALUES", split(col("VALUES"), "#"))
Run Code Online (Sandbox Code Playgroud)
但这里 split 函数将分隔符作为常量值,我无法将其传递为
val dept2 = dept.withColumn("SPLITED_VALUES", split(col("VALUES"), col("Delimiter")))
Run Code Online (Sandbox Code Playgroud)
谁能为此提出更好的方法?
检查下面的代码。
scala> df
.withColumn("delimiter",concat(lit("\\"),$"delimiter"))
.withColumn("split_values",expr("split(values,delimiter)"))
.show(false)
+-------------------+---------+----------------------+
|values |delimiter|split_value |
+-------------------+---------+----------------------+
|50000.0#0#0# |\# |[50000.0, 0, 0, ] |
|0@1000.0@ |\@ |[0, 1000.0, ] |
|1$ |\$ |[1, ] |
|1000.00^Test_string|\^ |[1000.00, Test_string]|
+-------------------+---------+----------------------+
Run Code Online (Sandbox Code Playgroud)
更新
scala> df
.withColumn("delimiter",concat(lit("\\"),$"delimiter"))
.withColumn("data",expr("array_remove(split(trim(values),delimiter),'')"))
.show(false)
+-------------------+---------+----------------------+
|values |delimiter|data |
+-------------------+---------+----------------------+
|50000.0#0#0# |\# |[50000.0, 0, 0] |
|0@1000.0@ |\@ |[0, 1000.0] |
|1$ |\$ |[1] |
|1000.00^Test_string|\^ |[1000.00, Test_string]|
+-------------------+---------+----------------------+
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
581 次 |
| 最近记录: |