如何基于其他列更新PySpark中的列?

Cha*_*mar 1 apache-spark pyspark

我有一个数据列,其列为“ CUSTOMER_MAILID”,“ OFFER_NAME”,“ OFFER_ISAPPLIED”。

样本数据:

+--------------------+--------------------+---------------+
|     CUSTOMER_MAILID|          OFFER_NAME|OFFER_ISAPPLIED|
+--------------------+--------------------+---------------+
|pushpendrakaushik...|Jaipur Pink Panth...|              N|
|pushpendrakaushik...|Jaipur Pink Panth...|              N|
|dr.kshitijmathur@...|                    |              N|
|spdadhichassociat...|                    |              N|
|vinod.gogia@herom...|Jaipur Pink Panth...|              N|
|prerak0401@gmail.com|                    |              N|
| garhwalsp@gmail.com|                    |              N|
|muditsharma1985@g...|                    |              N|
|  amit1185@gmail.com|Jaipur Pink Panth...|              N|
Run Code Online (Sandbox Code Playgroud)

如果“ OFFER_NAME”列具有某些值(空值除外),我想用“ Y”更新“ OFFER_ISAPPLIED”列值。

我该如何实现?

输出应如下所示:

+--------------------+--------------------+---------------+
|     CUSTOMER_MAILID|          OFFER_NAME|OFFER_ISAPPLIED|
+--------------------+--------------------+---------------+
|pushpendrakaushik...|Jaipur Pink Panth...|              Y|
|pushpendrakaushik...|Jaipur Pink Panth...|              Y|
|dr.kshitijmathur@...|                    |              N|
|spdadhichassociat...|                    |              N|
|vinod.gogia@herom...|Jaipur Pink Panth...|              Y|
|prerak0401@gmail.com|                    |              N|
| garhwalsp@gmail.com|                    |              N|
|muditsharma1985@g...|                    |              N|
|  amit1185@gmail.com|Jaipur Pink Panth...|              Y|
Run Code Online (Sandbox Code Playgroud)

小智 6

使用:

from pyspark.sql.functions import *

df.withColum("OFFER_ISAPPLIED",
  when(col("OFFER_NAME").isNull(), "N").otherwise("Y"))
Run Code Online (Sandbox Code Playgroud)