有没有更好的方法在pyspark中将Array <int>转换为Array <String>

Zha*_*ong 4 apache-spark apache-spark-sql pyspark spark-dataframe

一个非常庞大的 DataFrame with schema:

root
 |-- id: string (nullable = true)
 |-- ext: array (nullable = true)
 |    |-- element: integer (containsNull = true)
Run Code Online (Sandbox Code Playgroud)

到目前为止我尝试explode数据,然后collect_list:

select
  id,
  collect_list(cast(item as string))
from default.dual
lateral view explode(ext) t as item
group by
  id
Run Code Online (Sandbox Code Playgroud)

但这种方式过于庞大.

Sil*_*vio 10

您可以简单地将ext列转换为字符串数组

df = source.withColumn("ext", source.ext.cast("array<string>"))
df.printSchema()
df.show()
Run Code Online (Sandbox Code Playgroud)