Spark将列组合为嵌套数组

Geo*_*ler 1 apache-spark apache-spark-sql

如何将spark中的列组合为嵌套数组?

val inputSmall = Seq(
    ("A", 0.3, "B", 0.25),
    ("A", 0.3, "g", 0.4),
    ("d", 0.0, "f", 0.1),
    ("d", 0.0, "d", 0.7),
    ("A", 0.3, "d", 0.7),
    ("d", 0.0, "g", 0.4),
    ("c", 0.2, "B", 0.25)).toDF("column1", "transformedCol1", "column2", "transformedCol2")
Run Code Online (Sandbox Code Playgroud)

类似的东西

+-------+---------------+---------------+------- +
|column1|transformedCol1|transformedCol2|combined|
+-------+---------------+---------------+------ -+
|      A|            0.3|            0.3[0.3, 0.3]|
+-------+---------------+---------------+-------+
Run Code Online (Sandbox Code Playgroud)

Dan*_*ula 14

如果要将多个列组合到ArrayType的新列中,可以使用以下array函数:

import org.apache.spark.sql.functions._
val result = inputSmall.withColumn("combined", array($"transformedCol1", $"transformedCol2"))
result.show()

+-------+---------------+-------+---------------+-----------+
|column1|transformedCol1|column2|transformedCol2|   combined|
+-------+---------------+-------+---------------+-----------+
|      A|            0.3|      B|           0.25|[0.3, 0.25]|
|      A|            0.3|      g|            0.4| [0.3, 0.4]|
|      d|            0.0|      f|            0.1| [0.0, 0.1]|
|      d|            0.0|      d|            0.7| [0.0, 0.7]|
|      A|            0.3|      d|            0.7| [0.3, 0.7]|
|      d|            0.0|      g|            0.4| [0.0, 0.4]|
|      c|            0.2|      B|           0.25|[0.2, 0.25]|
+-------+---------------+-------+---------------+-----------+
Run Code Online (Sandbox Code Playgroud)

  • 啊啊,它会接受多个列,而不是多个字符串,所以这有效:`val names = Seq("foo","bar"); frame.withColumn("combined",array(names.map(frame(_)):_*))` (2认同)