使用 Pandas 数据框中的 JSON 数据规范化列

Wil*_*mar 7 python json normalize dataframe pandas

我有一个 Pandas 数据框,其中一列包含 JSON 数据(JSON 结构很简单:只有一层,没有嵌套数据):

ID,Date,attributes
9001,2020-07-01T00:00:06Z,"{"State":"FL","Source":"Android","Request":"0.001"}"
9002,2020-07-01T00:00:33Z,"{"State":"NY","Source":"Android","Request":"0.001"}"
9003,2020-07-01T00:07:19Z,"{"State":"FL","Source":"ios","Request":"0.001"}"
9004,2020-07-01T00:11:30Z,"{"State":"NY","Source":"windows","Request":"0.001"}"
9005,2020-07-01T00:15:23Z,"{"State":"FL","Source":"ios","Request":"0.001"}"
Run Code Online (Sandbox Code Playgroud)

我的熊猫数据框

我想标准化属性列中的 JSON 内容,以便 JSON 属性成为数据帧中的每一列。

ID,Date,attributes.State, attributes.Source, attributes.Request
9001,2020-07-01T00:00:06Z,FL,Android,0.001
9002,2020-07-01T00:00:33Z,NY,Android,0.001
9003,2020-07-01T00:07:19Z,FL,ios,0.001
9004,2020-07-01T00:11:30Z,NY,windows,0.001
9005,2020-07-01T00:15:23Z,FL,ios,0.001 

Run Code Online (Sandbox Code Playgroud)

我一直在尝试使用Pandas json_normalize,它需要一个字典。因此,我想我会将属性列转换为字典,但它并没有完全按照预期的方式工作,因为字典的形式如下:

df.attributes.to_dict()

{0: '{"State":"FL","Source":"Android","Request":"0.001"}',
 1: '{"State":"NY","Source":"Android","Request":"0.001"}',
 2: '{"State":"FL","Source":"ios","Request":"0.001"}',
 3: '{"State":"NY","Source":"windows","Request":"0.001"}',
 4: '{"State":"FL","Source":"ios","Request":"0.001"}'}

Run Code Online (Sandbox Code Playgroud)

并且规范化使用键 (0, 1, 2, ...) 作为列名,而不是 JSON 键。

我感觉我已经很接近了,但我不太清楚如何准确地做到这一点。欢迎任何想法。

谢谢你!

小智 1

您不应该\xe2\x80\x99t需要首先转换为字典。

\n

尝试:

\n
import pandas as pd\n\npd.json_normalize(df[\xe2\x80\x98attributes\xe2\x80\x99])\n
Run Code Online (Sandbox Code Playgroud)\n