The*_*o75 5 python json dataframe pandas
我手动创建一个 DataFrame:
import pandas as pd
df_articles1 = pd.DataFrame({'Id' : [4,5,8,9],
'Class':[
{'encourage': 1, 'contacting': 1},
{'cardinality': 16, 'subClassOf': 3},
{'get-13.5.1': 1},
{'cardinality': 12, 'encourage': 1}
]
})
Run Code Online (Sandbox Code Playgroud)
我将其导出到 csv 文件以在拆分后导入:
df_articles1.to_csv(f"""{path}articles_split.csv""", index = False, sep=";")
Run Code Online (Sandbox Code Playgroud)
我可以将其拆分为pd.json_normalize():
df_articles1 = pd.json_normalize(df_articles1['Class'])
Run Code Online (Sandbox Code Playgroud)
我将其 csv 文件导入到 DataFrame 中:
df_articles2 = pd.read_csv(f"""{path}articles_split.csv""", sep=";")
Run Code Online (Sandbox Code Playgroud)
但这失败了:
AttributeError: 'str' 对象没有属性 'values' pd.json_normalize(df_articles2['Class'])
这是因为当您保存“类”to_csv()列中的数据时,加载保存的数据后存储的方式并非如此:stringdictionary/json
df_articles2 = pd.read_csv(f"""{path}articles_split.csv""", sep=";")
Run Code Online (Sandbox Code Playgroud)
然后要使其恢复原始形式,请使用eval()方法和apply()方法:-
df_articles2['Class']=df_articles2['Class'].map(eval)
Run Code Online (Sandbox Code Playgroud)
最后:
resultdf=pd.json_normalize(df_articles2['Class'])
Run Code Online (Sandbox Code Playgroud)
现在如果你打印resultdf你会得到你想要的输出
注意:是的,根据这个线程。使用 eval 是不好的,但在某些情况下,当你的数据混乱时,你只剩下一个选择:那就是使用eval
虽然接受的答案有效,但使用eval是不好的做法。
要解析类似于JSON/dict的字符串列,请使用以下选项之一(如果可能,最后一个选项最好)。
ast.literal_eval(更好的)import ast
objects = df2['Class'].apply(ast.literal_eval)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)
# Id encourage contacting cardinality subClassOf get-13.5.1
# 0 4 1.0 1.0 NaN NaN NaN
# 1 5 NaN NaN 16.0 3.0 NaN
# 2 8 NaN NaN NaN NaN 1.0
# 3 9 1.0 NaN 12.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
json.loads(更好)import json
objects = df2['Class'].apply(json.loads)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)
# encourage contacting cardinality subClassOf get-13.5.1
# 0 1.0 1.0 NaN NaN NaN
# 1 NaN NaN 16.0 3.0 NaN
# 2 NaN NaN NaN NaN 1.0
# 3 1.0 NaN 12.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
如果字符串是单引号,请str.replace在应用之前将它们转换为双引号(从而有效的 JSON)json.loads:
objects = df2['Class'].str.replace("'", '"').apply(json.loads)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)
Run Code Online (Sandbox Code Playgroud)
pd.json_normalize 之前 pd.to_csv(推荐)如果可能,当您最初保存到 CSV 时,只需保存规范化的JSON(而不是原始 JSON 对象):
df1 = df1[['Id']].join(pd.json_normalize(df1['Class']))
df1.to_csv('df1_normalized.csv', index=False, sep=';')
# Id;encourage;contacting;cardinality;subClassOf;get-13.5.1
# 4;1.0;1.0;;;
# 5;;;16.0;3.0;
# 8;;;;;1.0
# 9;1.0;;12.0;;
Run Code Online (Sandbox Code Playgroud)
这是更自然的 CSV 工作流程(而不是存储/加载对象 blob):
df2 = pd.read_csv('df1_normalized.csv', sep=';')
# Id encourage contacting cardinality subClassOf get-13.5.1
# 0 4 1.0 1.0 NaN NaN NaN
# 1 5 NaN NaN 16.0 3.0 NaN
# 2 8 NaN NaN NaN NaN 1.0
# 3 9 1.0 NaN 12.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
10798 次 |
| 最近记录: |