小编Jer*_*oen的帖子

Pyspark 在列级别内向前和向后填充

我尝试填充 pyspark 数据框中缺失的数据。pyspark 数据框如下所示:

+---------+---------+-------------------+----+
| latitude|longitude|      timestamplast|name|
+---------+---------+-------------------+----+
|         | 4.905615|2019-08-01 00:00:00|   1|
|51.819645|         |2019-08-01 00:00:00|   1|
| 51.81964| 4.961713|2019-08-01 00:00:00|   2|
|         |         |2019-08-01 00:00:00|   3|
| 51.82918| 4.911187|                   |   3|
| 51.82385| 4.901488|2019-08-01 00:00:03|   5|
+---------+---------+-------------------+----+
Run Code Online (Sandbox Code Playgroud)

在“名称”列中,我想要向前填充或向后填充(以必要者为准)以仅填充“纬度”和“经度”(不应填充“timestamplast”)。我该怎么做呢?

输出将是:

+---------+---------+-------------------+----+
| latitude|longitude|      timestamplast|name|
+---------+---------+-------------------+----+
|51.819645| 4.905615|2019-08-01 00:00:00|   1|
|51.819645| 4.905615|2019-08-01 00:00:00|   1|
| 51.81964| 4.961713|2019-08-01 00:00:00|   2|
| 51.82918| 4.911187|2019-08-01 00:00:00|   3|
| 51.82918| 4.911187|                   |   3|
| 51.82385| 4.901488|2019-08-01 00:00:03|   5|
+---------+---------+-------------------+----+
Run Code Online (Sandbox Code Playgroud)

在 Pandas 中,这将这样做:

df = …
Run Code Online (Sandbox Code Playgroud)

pyspark imputation

4
推荐指数
1
解决办法
7860
查看次数

标签 统计

imputation ×1

pyspark ×1