Ban*_*ddy 2 hadoop apache-spark apache-spark-sql
给出以下结构:
val df = Seq("Color", "Shape", "Range","Size").map(Tuple1.apply).toDF("color")
val df1 = df.withColumn("Success", when($"color"<=> "white", "Diamond").otherwise(0))
Run Code Online (Sandbox Code Playgroud)
我想再写一个WHEN条件,上面的条件是大小> 10,并且Shape列的值为Rhombus,然后将“ Diamond”值插入该列,否则为0。我尝试如下所示,但失败了
val df1 = df.withColumn("Success", when($"color" <=> "white", "Diamond").otherwise(0)).when($"size">10)
Run Code Online (Sandbox Code Playgroud)
请建议我仅使用scala的dataframe选项。带有sqlContext的Spark-SQL对我没有帮助。
谢谢 !
您可以when
在https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Column.html#when-org.apache.spark.sql.Column中链接类似于示例的示例-java.lang.Object-
自(1.4.0)起可用
// Scala:
people.select(when(people("gender") === "male", 0)
.when(people("gender") === "female", 1)
.otherwise(2))
Run Code Online (Sandbox Code Playgroud)
你的例子:
val df1 = df.withColumn("Success",
when($"color" <=> "white", "Diamond")
.when($"size" > 10 && $"shape" === "Rhombus", "Diamond")
.otherwise(0))
Run Code Online (Sandbox Code Playgroud)
您尝试过制作 UDF 吗?尝试这样的事情:
// Define the UDF
val isDiamond= udf((color: String, shape: String, size : String) => {
if (color == "white" && shape == "Rhombus" && size > 10) "Diamond"
else ""
})
val df2 = df.withColumn("Success", isDiamond($"color", $"shape", $"size"))
Run Code Online (Sandbox Code Playgroud)
问候。
归档时间: |
|
查看次数: |
5498 次 |
最近记录: |