在 pyspark 数据帧上用逗号替换点

Mar*_*ias 2 python replace dataframe pyspark

我正在使用下面的代码来收集一些信息:

df = (
  df
  .select(
        date_format(date_trunc('month', col("reference_date")), 'yyyy-MM-dd').alias("month"),
        col("id"),
        col("name"),
        col("item_type"),
        col("sub_group"),
        col("latitude"),
        col("longitude")
  )
Run Code Online (Sandbox Code Playgroud)

我的纬度和经度是带点的值,如下所示:-30.130307 -51.2060018 但我必须将点替换为逗号。我已经尝试过 .replace() 和 .regexp_replace() 但它们都不起作用。你们能帮我吗?

小智 6

以下面的数据框为例。

df.show()
+-------------------+-------------------+                                       
|           latitude|          longitude|
+-------------------+-------------------+
|  85.70708380916193| -68.05674981929877|
| 57.074495803252404|-42.648691976080215|
|  2.944303748172473| -62.66186439333423|
| 119.76923402031701|-114.41179457810185|
|-138.52573939229234|  54.38429596238362|
+-------------------+-------------------+
Run Code Online (Sandbox Code Playgroud)

您应该能够使用spark.sql如下功能

from pyspark.sql import functions

df = df.withColumn("longitude", functions.regexp_replace('longitude',r'[.]',","))
df = df.withColumn("latitude", functions.regexp_replace('latitude',r'[.]',","))
df.show()
+-------------------+-------------------+
|           latitude|          longitude|
+-------------------+-------------------+
|  85,70708380916193| -68,05674981929877|
| 57,074495803252404|-42,648691976080215|
|  2,944303748172473| -62,66186439333423|
| 119,76923402031701|-114,41179457810185|
|-138,52573939229234|  54,38429596238362|
+-------------------+-------------------+
Run Code Online (Sandbox Code Playgroud)