小编Wan*_*nda的帖子

PySpark 对结构体数组进行排序

这是我的数据框的虚拟样本

data = [
    [3273, "city y", [["ids", 27], ["smf", 13], ["tlk", 35], ["thr", 24]]],
    [3213, "city x", [["smf", 23], ["tlk", 15], ["ids", 17], ["thr", 34]]],
]
df = spark.createDataFrame(
    data, "city_id:long, city_name:string, cel:array<struct<carr:string, subs:int>>"
)
df.show(2, False)

+-------+---------+--------------------------------------------+
|city_id|city_name|cel                                         |
+-------+---------+--------------------------------------------+
|3273   |city y   |[[ids, 27], [smf, 13], [tlk, 35], [thr, 24]]|
|3213   |city x   |[[smf, 23], [tlk, 15], [ids, 17], [thr, 34]]|
+-------+---------+--------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

我需要根据其subs值对列cel的数组进行降序排序。会是这样的

+-------+---------+--------------------------------------------+
|city_id|city_name|cel                                         |
+-------+---------+--------------------------------------------+
|3273   |city y   |[[tlk, 35], [ids, 27], …
Run Code Online (Sandbox Code Playgroud)

python apache-spark pyspark

5
推荐指数
1
解决办法
7474
查看次数

标签 统计

apache-spark ×1

pyspark ×1

python ×1