Utk*_*raf 0 apache-spark pyspark
我创建了如下数据框:
+----+-------+-------+
| age| number|name |
+----+-------+-------+
| 16| 12|A |
| 16| 13|B |
| 17| 16|E |
| 17| 17|F |
+----+-------+-------+
Run Code Online (Sandbox Code Playgroud)
如何将其转换为以下json:
{
'age' : 16,
'values' : [{‘number’: ‘12’ , ‘name’ : 'A'},{‘number’: ‘12’ , ‘name’ : 'A'} ]
},{
'age' : 17,
'values' : [{‘number’: ‘16’ , ‘name’ : 'E'},{‘number’: ‘17’ , ‘name’ : 'F'} ]
}
Run Code Online (Sandbox Code Playgroud)
假设df是你的数据框,
from pyspark.sql import functions as F
new_df = df.select(
"age",
F.struct(
F.col("number"),
F.col("name"),
).alias("values")
).groupBy(
"age"
).agg(
F.collect_list("values").alias("values")
)
new_df.toJSON()
# or
new_df.write.json(...)
Run Code Online (Sandbox Code Playgroud)