Dan*_*Dan 3 python json dataframe pandas
如何爆炸pandas数据框?
输入 df:
所需输出 df:
+----------------+------+-----+------+
|level_2 | date | val | num |
+----------------+------+-----+------+
| name_1a | 2020 | 1 | null |
| name_1b | 2019 | 2 | null |
| name_1b | 2020 | 3 | null |
| name_10000_xyz | 2018 | 4 | str |
| name_10000_xyz | 2019 | 5 | null |
| name_10000_xyz | 2020 | 6 | str |
+------------------------------------+
Run Code Online (Sandbox Code Playgroud)
重现输入 df:
import pandas as pd
pd.set_option('display.max_colwidth', None)
data={'level_2':{1:'name_1a',3:'name_1b',5:'name_10000_xyz'},'value':{1:[{'date':'2020','val':1}],3:[{'date':'2019','val':2},{'date':'2020','val':3}],5:[{'date':'2018','val':4,'num':'str'},{'date':'2019','val':5},{'date':'2020','val':6,'num':'str'}]}}
df = pd.DataFrame(data)
Run Code Online (Sandbox Code Playgroud)
Explode列上的数据框value,然后是pop列value,并从中创建一个新的数据框,然后是join带有分解框架的新框架。
s = df.explode('value', ignore_index=True)
s.join(pd.DataFrame([*s.pop('value')], index=s.index))
Run Code Online (Sandbox Code Playgroud)
level_2 date val num
0 name_1a 2020 1 NaN
1 name_1b 2019 2 NaN
2 name_1b 2020 3 NaN
3 name_10000_xyz 2018 4 str
4 name_10000_xyz 2019 5 NaN
5 name_10000_xyz 2020 6 str
Run Code Online (Sandbox Code Playgroud)