aar*_*ler 2 python dataframe apache-spark pyspark spark-dataframe
假设我有一个像这样的pyspark数据帧:
KEY VALUE
--- -----
623 "cat"
245 "dog"
null "horse"
null "pig"
331 "narwhal"
null "snake"
Run Code Online (Sandbox Code Playgroud)
如何转换此数据帧,以便列中的任何null值KEY都替换为从1?开始的整数序列?期望的结果如下:
KEY VALUE
--- -----
623 "cat"
245 "dog"
1 "horse"
2 "pig"
331 "narwhal"
3 "snake"
Run Code Online (Sandbox Code Playgroud)
我知道你要求Python,但也许Scala中的等价物会有所帮助.基本上,您希望将该Window功能rank与功能一起使用coalesce.首先我们定义一些测试数据:
val df = Seq(
(Option(623), "cat"),
(Option(245),"dog"),
(None, "horse"),
(None, "pig"),
(Option(331), "narwhal"),
(None, "snake")
).toDF("key","value")
Run Code Online (Sandbox Code Playgroud)
然后我们将rank所有a的实例key,然后我们将用于coalesce选择原始key或新的rank,然后删除rank我们创建的列只是为了清理它:
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.functions._
val window = Window.partitionBy(col("key")).orderBy(col("value"))
df.withColumn("rank", rank.over(window))
.withColumn("key", coalesce(col("key"),col("rank")))
.drop("rank")
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
521 次 |
| 最近记录: |