小编wwn*_*nde的帖子

在 Pyspark 中压平 Json

my_data=[
    {'stationCode': 'NB001',
       'summaries': [{'period': {'year': 2017}, 'rainfall': 449},
        {'period': {'year': 2018}, 'rainfall': 352.4},
        {'period': {'year': 2019}, 'rainfall': 253.2},
        {'period': {'year': 2020}, 'rainfall': 283},
        {'period': {'year': 2021}, 'rainfall': 104.2}]},
    {'stationCode': 'NA003',
       'summaries': [{'period': {'year': 2019}, 'rainfall': 58.2},
        {'period': {'year': 2020}, 'rainfall': 628.2},
        {'period': {'year': 2021}, 'rainfall': 120}]}]
Run Code Online (Sandbox Code Playgroud)

在 Pandas 中我可以:

import pandas as pd
from pandas import json_normalize
pd.concat([json_normalize(entry, 'summaries', 'stationCode') 
                     for entry in my_data])
Run Code Online (Sandbox Code Playgroud)

这会给我下表:

    rainfall  period.year stationCode
0     449.0         2017       NB001
1     352.4         2018       NB001
2     253.2         2019 …
Run Code Online (Sandbox Code Playgroud)

json python-3.x pyspark databricks

5
推荐指数
1
解决办法
4294
查看次数

标签 统计

databricks ×1

json ×1

pyspark ×1

python-3.x ×1