小编Jac*_*son的帖子

PySpark“爆炸”列中的字典

我在 spark 数据框中有一列“true_recoms”:

-RECORD 17----------------------------------------------------------------- 
item        | 20380109                                                                                                                                                                  
true_recoms | {"5556867":1,"5801144":5,"7397596":21}          
Run Code Online (Sandbox Code Playgroud)

我需要“爆炸”这个列来得到这样的东西:

item        | 20380109                                                                                                                                                                  
recom_item  | 5556867
recom_cnt   | 1
..............
item        | 20380109                                                                                                                                                                  
recom_item  | 5801144
recom_cnt   | 5
..............
item        | 20380109                                                                                                                                                                  
recom_item  | 7397596
recom_cnt   | 21
Run Code Online (Sandbox Code Playgroud)

我试过使用 from_json 但它不起作用:

    schema_json = StructType(fields=[
        StructField("item", StringType()),
        StructField("recoms", StringType())
    ])
    df.select(col("true_recoms"),from_json(col("true_recoms"), schema_json)).show(5)

+--------+--------------------+------+
|    item|         true_recoms|true_r|
+--------+--------------------+------+
|31746548|{"32731749":3,"31...|   [,]|
|17359322|{"17359392":1,"17...|   [,]|
|31480894|{"31480598":1,"31...|   [,]|
| 7265665|{"7265891":1,"503...|   [,]|
|31350949|{"32218698":1,"31...|   [,]|
+--------+--------------------+------+
only showing top 5 rows
Run Code Online (Sandbox Code Playgroud)

explode apache-spark pyspark

5
推荐指数
1
解决办法
5938
查看次数

标签 统计

apache-spark ×1

explode ×1

pyspark ×1