Pandas - 将嵌套的 json 分成多行

Question

Pandas - 将嵌套的 json 分成多行

我的数据框位于以下结构中。我想根据详细信息列中的嵌套值来打破它们

cust_id, name, details
101, Kevin, [{"id":1001,"country":"US","state":"OH"}, {"id":1002,"country":"US","state":"GA"}]
102, Scott, [{"id":2001,"country":"US","state":"OH"}, {"id":2002,"country":"US","state":"GA"}]

Run Code Online (Sandbox Code Playgroud)

预期产出

cust_id, name, id, country, state
101, Kevin, 1001, US, OH
101, Kevin, 1002, US, GA
102, Scott, 2001, US, OH
102, Scott, 2002, US, GA

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 8

df = df.explode('details').reset_index(drop=True)
df = df.merge(pd.json_normalize(df['details']), left_index=True, right_index=True).drop('details', axis=1)

Run Code Online (Sandbox Code Playgroud)

df.explode("details")基本上将每一行重复N 次，其中 N 是该行的details数组（如果有）中的项目数details
由于explode重复行，原始行的索引（0 和 1）被复制到新行，因此它们的索引为 0, 0, 1, 1，这会扰乱后续处理。reset_index()为索引创建一个全新的列，从开始0。drop=True使用是因为默认情况下 pandas 将保留旧的索引列；这将删除它。
pd.json_normalize(df['details'])将列（其中每行包含一个 JSON 对象）转换为新的数据帧，其中所有 JSON 对象的每个唯一键都是新列
df.merge()将新数据帧合并到原始数据帧中
left_index=True并right_index=True告诉 pandas 将指定的数据帧从其第一行开始合并到此数据帧中，从其第一行开始
.drop('details', axis=1)删除details包含旧对象的旧列

归档时间：	4 年，2 月前
查看次数：	1761 次
最近记录：	4 年，2 月前