wwn*_*nde 5 json python-3.x pyspark databricks
my_data=[
{'stationCode': 'NB001',
'summaries': [{'period': {'year': 2017}, 'rainfall': 449},
{'period': {'year': 2018}, 'rainfall': 352.4},
{'period': {'year': 2019}, 'rainfall': 253.2},
{'period': {'year': 2020}, 'rainfall': 283},
{'period': {'year': 2021}, 'rainfall': 104.2}]},
{'stationCode': 'NA003',
'summaries': [{'period': {'year': 2019}, 'rainfall': 58.2},
{'period': {'year': 2020}, 'rainfall': 628.2},
{'period': {'year': 2021}, 'rainfall': 120}]}]
Run Code Online (Sandbox Code Playgroud)
在 Pandas 中我可以:
import pandas as pd
from pandas import json_normalize
pd.concat([json_normalize(entry, 'summaries', 'stationCode')
for entry in my_data])
Run Code Online (Sandbox Code Playgroud)
这会给我下表:
rainfall period.year stationCode
0 449.0 2017 NB001
1 352.4 2018 NB001
2 253.2 2019 NB001
3 283.0 2020 NB001
4 104.2 2021 NB001
0 58.2 2019 NA003
1 628.2 2020 NA003
2 120.0 2021 NA003
Run Code Online (Sandbox Code Playgroud)
pyspark中一行代码可以实现这一点吗?
我已经尝试过下面的代码,它给了我相同的结果。但是,它太长了,有办法缩短吗?
df=sc.parallelize(my_data)
df1=spark.read.json(df)
df1.select("stationCode","summaries.period.year","summaries.rainfall").display()
df1 = df1.withColumn("year_rainfall", F.arrays_zip("year", "rainfall"))
.withColumn("year_rainfall", F.explode("year_rainfall"))
.select("stationCode",
F.col("year_rainfall.rainfall").alias("Rainfall"),
F.col("year_rainfall.year").alias("Year"))
df1.display(20, False)
Run Code Online (Sandbox Code Playgroud)
向 pyspark 自我介绍,因此我们将非常感谢一些解释或良好的信息来源
你所拥有的对我来说看起来不错并且可读。不过你也可以直接压缩并爆炸:
out = (df1.select("stationCode",
F.explode(F.arrays_zip(*["summaries.period.year","summaries.rainfall"])))
.select("stationCode",F.col("col")['0'].alias("year"),F.col("col")['1'].alias("rainfall")))
Run Code Online (Sandbox Code Playgroud)
out.show()
+-----------+----+--------+
|stationCode|year|rainfall|
+-----------+----+--------+
| NB001|2017| 449.0|
| NB001|2018| 352.4|
| NB001|2019| 253.2|
| NB001|2020| 283.0|
| NB001|2021| 104.2|
| NA003|2019| 58.2|
| NA003|2020| 628.2|
| NA003|2021| 120.0|
+-----------+----+--------+
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4294 次 |
| 最近记录: |