kat*_*tty 0 hadoop scala bigdata apache-spark
我想选择几列,添加几列或除以某些列,并用空格填充它们,并以新名称存储它们作为别名。例如,SQL中的内容应类似于:
select " " as col1, b as b1, c+d as e from table
Run Code Online (Sandbox Code Playgroud)
如何在Spark中实现这一目标?
小智 6
使用 Spark-SQL,您可以执行相同的操作。
import org.apache.spark.sql.functions._
val df1 = Seq(
("A",1,5,3),
("B",3,4,2),
("C",4,6,3),
("D",5,9,1)).toDF("a","b","c","d")
df1.createOrReplaceTempView("table")
df1.show()
val df2 = spark.sql("select ' ' as col1, b as b1, c+d as e from table ").show()
Run Code Online (Sandbox Code Playgroud)
输入:
+---+---+---+---+
| a| b| c| d|
+---+---+---+---+
| A| 1| 5| 3|
| B| 3| 4| 2|
| C| 4| 6| 3|
| D| 5| 9| 1|
+---+---+---+---+
Run Code Online (Sandbox Code Playgroud)
输出 :
+----+---+---+
|col1| b1| e|
+----+---+---+
| | 1| 8|
| | 3| 6|
| | 4| 9|
| | 5| 10|
+----+---+---+
Run Code Online (Sandbox Code Playgroud)
您也可以使用本机DF函数。例如给出:
import org.apache.spark.sql.functions._
val df1 = Seq(
("A",1,5,3),
("B",3,4,2),
("C",4,6,3),
("D",5,9,1)).toDF("a","b","c","d")
Run Code Online (Sandbox Code Playgroud)
选择以下列:
df1.select(lit(" ").as("col1"),
col("b").as("b1"),
(col("c") + col("d")).as("e"))
Run Code Online (Sandbox Code Playgroud)
给您预期的结果:
+----+---+---+
|col1| b1| e|
+----+---+---+
| | 1| 8|
| | 3| 6|
| | 4| 9|
| | 5| 10|
+----+---+---+
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6298 次 |
| 最近记录: |