在 Spark 的 where 子句中将多个条件作为字符串传递

Question

在 Spark 的 where 子句中将多个条件作为字符串传递

Dar*_*nek 1 scala apache-spark apache-spark-sql apache-spark-dataset apache-spark-2.0

我正在使用 DataFrame API 在 Spark 中编写以下代码。

val cond = "col("firstValue") >= 0.5 & col("secondValue") >= 0.5 & col("thirdValue") >= 0.5"
val Output1 = InputDF.where(cond)

Run Code Online (Sandbox Code Playgroud)

我将所有条件作为来自外部参数的字符串传递，但它会抛出一个解析错误，因为它cond应该是类型Column。

例如：

col("firstValue") >= 0.5 & col("secondValue") >= 0.5 & col("thirdValue") >= 0.5

Run Code Online (Sandbox Code Playgroud)

由于我想动态传递多个条件，如何将 a 转换String为 a Column？

编辑

有什么东西可以让我从外部读取条件列表 as Column，因为我没有找到任何可以使用 Scala 代码将 a 转换String为 a 的东西Column。

Answer 1

ste*_*ino 5

我相信您可能想要执行以下操作：

InputDF.where("firstValue >= 0.5 and secondValue >= 0.5 and thirdValue >= 0.5")

Run Code Online (Sandbox Code Playgroud)

您面临的错误是运行时的解析错误，如果错误是由传入的错误类型引起的，则它甚至不会编译。

正如您在官方文档（此处为 Spark 2.3.0 提供的）中所见，该where方法可以采用Columns序列（如您的后一个代码段）或表示 SQL 谓词的字符串（如我的示例）。

SQL 谓词将由 Spark 解释。但是，我相信值得一提的是，您可能对组合Columns 而不是连接字符串感兴趣，因为前一种方法通过消除整个可能的错误类别（例如解析错误）来最小化错误面。

您可以使用以下代码实现相同的目的：

InputDF.where(col("firstValue") >= 0.5 and col("secondValue") >= 0.5 and col("thirdValue") >= 0.5)

Run Code Online (Sandbox Code Playgroud)

或更简洁地说：

import spark.implicits._ // necessary for the $"" notation
InputDF.where($"firstValue" >= 0.5 and $"secondValue" >= 0.5 and $"thirdValue" >= 0.5)

Run Code Online (Sandbox Code Playgroud)

Columns 很容易组合并且比原始字符串更健壮。如果您想要应用一组条件，您可以轻松地将and它们组合在一个函数中，甚至在您运行程序之前就可以验证该函数：

def allSatisfied(condition: Column, conditions: Column*): Column =
    conditions.foldLeft(condition)(_ and _)

InputDF.where(allSatisfied($"firstValue" >= 0.5, $"secondValue" >= 0.5, $"thirdValue" >= 0.5))

Run Code Online (Sandbox Code Playgroud)

当然，您可以使用字符串实现相同的效果，但这最终会变得不那么健壮：

def allSatisfied(condition: String, conditions: String*): String =
    conditions.foldLeft(condition)(_ + " and " + _)

InputDF.where(allSatisfied("firstValue >= 0.5", "secondValue >= 0.5", "thirdValue >= 0.5"))

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，7 月前
查看次数：	8795 次
最近记录：	6 年，3 月前