Jor*_*ado 17 dataframe apache-spark spark-dataframe
后:
val df = Seq((1, Vector(2, 3, 4)), (1, Vector(2, 3, 4))).toDF("Col1", "Col2")
Run Code Online (Sandbox Code Playgroud)
我在Apache Spark中有这个DataFrame:
+------+---------+
| Col1 | Col2 |
+------+---------+
| 1 |[2, 3, 4]|
| 1 |[2, 3, 4]|
+------+---------+
Run Code Online (Sandbox Code Playgroud)
我如何将其转换为:
+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 |
+------+------+------+------+
| 1 | 2 | 3 | 4 |
| 1 | 2 | 3 | 4 |
+------+------+------+------+
Run Code Online (Sandbox Code Playgroud)
sgv*_*gvd 21
不能转换为RDD的解决方案:
df.select($"Col1", $"Col2"(0) as "Col2", $"Col2"(1) as "Col3", $"Col2"(2) as "Col3")
Run Code Online (Sandbox Code Playgroud)
或者说可以更好:
val nElements = 3
df.select(($"Col1" +: Range(0, nElements).map(idx => $"Col2"(idx) as "Col" + (idx + 2)):_*))
Run Code Online (Sandbox Code Playgroud)
Spark数组列的大小不固定,例如您可以:
+----+------------+
|Col1| Col2|
+----+------------+
| 1| [2, 3, 4]|
| 1|[2, 3, 4, 5]|
+----+------------+
Run Code Online (Sandbox Code Playgroud)
因此,有没有办法让该列的数量和创建这些.如果您知道大小始终相同,则可以这样设置nElements:
val nElements = df.select("Col2").first.getList(0).size
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
21909 次 |
| 最近记录: |