小编Cha*_*mar的帖子

如何基于其他列更新PySpark中的列?

我有一个数据列,其列为“ CUSTOMER_MAILID”,“ OFFER_NAME”,“ OFFER_ISAPPLIED”。

样本数据:

+--------------------+--------------------+---------------+
|     CUSTOMER_MAILID|          OFFER_NAME|OFFER_ISAPPLIED|
+--------------------+--------------------+---------------+
|pushpendrakaushik...|Jaipur Pink Panth...|              N|
|pushpendrakaushik...|Jaipur Pink Panth...|              N|
|dr.kshitijmathur@...|                    |              N|
|spdadhichassociat...|                    |              N|
|vinod.gogia@herom...|Jaipur Pink Panth...|              N|
|prerak0401@gmail.com|                    |              N|
| garhwalsp@gmail.com|                    |              N|
|muditsharma1985@g...|                    |              N|
|  amit1185@gmail.com|Jaipur Pink Panth...|              N|
Run Code Online (Sandbox Code Playgroud)

如果“ OFFER_NAME”列具有某些值(空值除外),我想用“ Y”更新“ OFFER_ISAPPLIED”列值。

我该如何实现?

输出应如下所示:

+--------------------+--------------------+---------------+
|     CUSTOMER_MAILID|          OFFER_NAME|OFFER_ISAPPLIED|
+--------------------+--------------------+---------------+
|pushpendrakaushik...|Jaipur Pink Panth...|              Y|
|pushpendrakaushik...|Jaipur Pink Panth...|              Y|
|dr.kshitijmathur@...|                    |              N|
|spdadhichassociat...|                    |              N|
|vinod.gogia@herom...|Jaipur Pink Panth...|              Y|
|prerak0401@gmail.com|                    |              N|
| garhwalsp@gmail.com|                    |              N|
|muditsharma1985@g...| …
Run Code Online (Sandbox Code Playgroud)

apache-spark pyspark

1
推荐指数
1
解决办法
5679
查看次数

标签 统计

apache-spark ×1

pyspark ×1