Kum*_*123 2 scala apache-spark apache-spark-sql
我有一个Spark数据框,如下所示
id person age
1 naveen 24
Run Code Online (Sandbox Code Playgroud)
我想为每个列值添加一个常量"del",除了数据框中的最后一列,如下所示,
id person age
1del naveendel 24
Run Code Online (Sandbox Code Playgroud)
有人可以帮助我如何使用Scala在Spark df中实现它
你可以使用lit和concat功能:
import org.apache.spark.sql.functions._
// add suffix to all but last column (would work for any number of cols):
val colsWithSuffix = df.columns.dropRight(1).map(c => concat(col(c), lit("del")) as c)
def result = df.select(colsWithSuffix :+ $"age": _*)
result.show()
// +----+---------+---+
// |id |person |age|
// +----+---------+---+
// |1del|naveendel|24 |
// +----+---------+---+
Run Code Online (Sandbox Code Playgroud)
编辑:为了也容纳空值,你可以coalesce在附加后缀之前用列包装- 用以下方法替换类似的计算colsWithSuffix:
val colsWithSuffix = df.columns.dropRight(1)
.map(c => concat(coalesce(col(c), lit("")), lit("del")) as c)
Run Code Online (Sandbox Code Playgroud)