Spark-dataframe：使用 2 个布尔条件创建新列

Question

Spark-dataframe：使用 2 个布尔条件创建新列

Sté*_*ier 1 apache-spark apache-spark-sql pyspark

我想根据 2 个布尔条件通过按位 AND 运算来改变我的数据帧

df %>% mutate(newVariable = ifelse(variable1 == "value1" & variable2 == "value2, variable3, NULL)

Run Code Online (Sandbox Code Playgroud)

所以在 PySpark 中测试了这个：

import pyspark.sql.functions as func

df.withColumn("newVariable", func.when( \
     func.col("variable1") == "value1" & func.col("variable2") == "value2", \
     func.col("variable3")))

Run Code Online (Sandbox Code Playgroud)

但我有一个错误

使用 Spark DataFrame 创建这种新变量的正确方法是什么？

Answer 1

zer*_*323 5

您必须记住运算符优先级。在 Python 中&具有更高的优先级，==因此必须将各个相等性检查放在括号中：

(func.col("variable1") == "value1") & (func.col("variable2") == "value2")

Run Code Online (Sandbox Code Playgroud)

否则表达式的计算结果为：

(func.col("variable1") == ("value1" & func.col("variable2"))) == "value2"

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，4 月前
查看次数：	6262 次
最近记录：	7 年，1 月前