Pandas:如何使用 json 数组分解数据框

Dan*_*Dan 3 python json dataframe pandas

如何爆炸pandas数据框?

输入 df:

在此输入图像描述

所需输出 df:

+----------------+------+-----+------+
|level_2         | date | val | num  | 
+----------------+------+-----+------+
| name_1a        | 2020 |  1  | null |
| name_1b        | 2019 |  2  | null |
| name_1b        | 2020 |  3  | null |
| name_10000_xyz | 2018 |  4  | str  |
| name_10000_xyz | 2019 |  5  | null |
| name_10000_xyz | 2020 |  6  | str  |
+------------------------------------+
Run Code Online (Sandbox Code Playgroud)

重现输入 df:

import pandas as pd
pd.set_option('display.max_colwidth', None)
data={'level_2':{1:'name_1a',3:'name_1b',5:'name_10000_xyz'},'value':{1:[{'date':'2020','val':1}],3:[{'date':'2019','val':2},{'date':'2020','val':3}],5:[{'date':'2018','val':4,'num':'str'},{'date':'2019','val':5},{'date':'2020','val':6,'num':'str'}]}}
df = pd.DataFrame(data)
Run Code Online (Sandbox Code Playgroud)

Shu*_*rma 7

Explode列上的数据框value,然后是popvalue,并从中创建一个新的数据框,然后是join带有分解框架的新框架。

s = df.explode('value', ignore_index=True)
s.join(pd.DataFrame([*s.pop('value')], index=s.index))
Run Code Online (Sandbox Code Playgroud)
          level_2  date  val  num
0         name_1a  2020    1  NaN
1         name_1b  2019    2  NaN
2         name_1b  2020    3  NaN
3  name_10000_xyz  2018    4  str
4  name_10000_xyz  2019    5  NaN
5  name_10000_xyz  2020    6  str
Run Code Online (Sandbox Code Playgroud)