Geo*_*eRF 19 apache-spark apache-spark-sql pyspark pyspark-sql
import numpy as np
df = spark.createDataFrame(
[(1, 1, None),
(1, 2, float(5)),
(1, 3, np.nan),
(1, 4, None),
(0, 5, float(10)),
(1, 6, float('nan')),
(0, 6, float('nan'))],
('session', "timestamp1", "id2"))
Run Code Online (Sandbox Code Playgroud)
+-------+----------+----+
|session|timestamp1| id2|
+-------+----------+----+
| 1| 1|null|
| 1| 2| 5.0|
| 1| 3| NaN|
| 1| 4|null|
| 0| 5|10.0|
| 1| 6| NaN|
| 0| 6| NaN|
+-------+----------+----+
Run Code Online (Sandbox Code Playgroud)
当session == 0时,如何用值999替换timestamp1列的值?
预期产出
+-------+----------+----+
|session|timestamp1| id2|
+-------+----------+----+
| 1| 1|null|
| 1| 2| 5.0|
| 1| 3| NaN|
| 1| 4|null|
| 0| 999|10.0|
| 1| 6| NaN|
| 0| 999| NaN|
+-------+----------+----+
Run Code Online (Sandbox Code Playgroud)
可以在PySpark中使用replace()吗?
Ass*_*son 42
你应该使用when(with otherwise)函数:
from pyspark.sql.functions import when
targetDf = df.withColumn("timestamp1", \
when(df["session"] == 0, 999).otherwise(df["timestamp1"]))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
30275 次 |
| 最近记录: |