将 JSON 文件转换为 Pandas 数据框

Question

将 JSON 文件转换为 Pandas 数据框

Ale*_*man 5 python json dataframe pandas

我想将 JSON 转换为 Pandas 数据框。

我的 JSON 看起来像：像：

{ 
   "country1":{ 
      "AdUnit1":{ 
         "floor_price1":{ 
            "feature1":1111,
            "feature2":1112
         },
         "floor_price2":{ 
            "feature1":1121
         }
      },
      "AdUnit2":{ 
         "floor_price1":{ 
            "feature1":1211
         },
         "floor_price2":{ 
            "feature1":1221
         }
      }
   },
   "country2":{ 
      "AdUnit1":{ 
         "floor_price1":{ 
            "feature1":2111,
            "feature2":2112
         }
      }
   }
}

Run Code Online (Sandbox Code Playgroud)

我使用以下代码从 GCP 读取文件：

project = Context.default().project_id
sample_bucket_name = 'my_bucket'
sample_bucket_path = 'gs://' + sample_bucket_name
print('Object: ' + sample_bucket_path + '/json_output.json')

sample_bucket = storage.Bucket(sample_bucket_name)
sample_bucket.create()
sample_bucket.exists()

sample_object = sample_bucket.object('json_output.json')
list(sample_bucket.objects())
json = sample_object.read_stream()

Run Code Online (Sandbox Code Playgroud)

我的目标是获得如下所示的 Pandas 数据框：

我尝试使用json_normalize，但没有成功。

Answer 1

Luc*_*tin 5

嵌套的 JSON 总是很难正确处理。

几个月前，我想出了一个办法提供一个“万能答案”用文字优美flatten_json_iterative_solution从这里：该迭代解压一个给定的JSON的每个级别。

然后可以简单地将它转换为Pandas.Series然后Pandas.DataFrame像这样：

df = pd.Series(flatten_json_iterative_solution(dict(json_))).to_frame().reset_index()

Run Code Online (Sandbox Code Playgroud)

中间数据帧结果

可以轻松执行一些数据转换以拆分您要求的列名称中的索引：

df[["index", "col1", "col2", "col3", "col4"]] = df['index'].apply(lambda x: pd.Series(x.split('_')))

Run Code Online (Sandbox Code Playgroud)

最后结果

Answer 2

Zio*_*sof 1

你可以使用这个：

def flatten_dict(d):
    """ Returns list of lists from given dictionary """
    l = []
    for k, v in sorted(d.items()):
        if isinstance(v, dict):
            flatten_v = flatten_dict(v)
            for my_l in reversed(flatten_v):
                my_l.insert(0, k)

            l.extend(flatten_v)

        elif isinstance(v, list):
            for l_val in v:
                l.append([k, l_val])

        else:
            l.append([k, v])

    return l

Run Code Online (Sandbox Code Playgroud)

该函数接收一个字典（包括嵌套，其中值也可以是列表）并将其展平为列表列表。

然后，您可以简单地：

df = pd.DataFrame(flatten_dict(my_dict))

Run Code Online (Sandbox Code Playgroud)

my_dict你的 JSON 对象在哪里。以你的例子来说，你跑步时得到的print(df)是：

          0        1             2         3     4
0  country1  AdUnit1  floor_price1  feature1  1111
1  country1  AdUnit1  floor_price1  feature2  1112
2  country1  AdUnit1  floor_price2  feature1  1121
3  country1  AdUnit2  floor_price1  feature1  1211
4  country1  AdUnit2  floor_price2  feature1  1221
5  country2  AdUnit1  floor_price1  feature1  2111
6  country2  AdUnit1  floor_price1  feature2  2112

Run Code Online (Sandbox Code Playgroud)

创建数据框时，您可以命名列和索引

归档时间：	6 年，3 月前
查看次数：	6481 次
最近记录：	6 年，3 月前