pd.json_normalize() 给出“str 对象没有属性‘值’”

The*_*o75 5 python json dataframe pandas

我手动创建一个 DataFrame:

import pandas as pd
df_articles1 = pd.DataFrame({'Id'   : [4,5,8,9],
                            'Class':[
                                        {'encourage': 1, 'contacting': 1},
                                        {'cardinality': 16, 'subClassOf': 3},
                                        {'get-13.5.1': 1},
                                        {'cardinality': 12, 'encourage': 1}
                                    ]
                            }) 
Run Code Online (Sandbox Code Playgroud)

我将其导出到 csv 文件以在拆分后导入:

df_articles1.to_csv(f"""{path}articles_split.csv""", index = False, sep=";")
Run Code Online (Sandbox Code Playgroud)

我可以将其拆分为pd.json_normalize()

df_articles1 = pd.json_normalize(df_articles1['Class'])
Run Code Online (Sandbox Code Playgroud)

我将其 csv 文件导入到 DataFrame 中:

df_articles2 = pd.read_csv(f"""{path}articles_split.csv""", sep=";") 
Run Code Online (Sandbox Code Playgroud)

但这失败了:

AttributeError: 'str' 对象没有属性 'values' pd.json_normalize(df_articles2['Class'])

Anu*_*bas 9

这是因为当您保存“类”to_csv()列中的数据时,加载保存的数据后存储的方式并非如此:stringdictionary/json

df_articles2 = pd.read_csv(f"""{path}articles_split.csv""", sep=";") 
Run Code Online (Sandbox Code Playgroud)

然后要使其恢复原始形式,请使用eval()方法和apply()方法:-

df_articles2['Class']=df_articles2['Class'].map(eval)
Run Code Online (Sandbox Code Playgroud)

最后:

resultdf=pd.json_normalize(df_articles2['Class'])
Run Code Online (Sandbox Code Playgroud)

现在如果你打印resultdf你会得到你想要的输出

注意:是的,根据这个线程使用 eval 是不好的,但在某些情况下,当你的数据混乱时,你只剩下一个选择:那就是使用eval


tdy*_*tdy 8

虽然接受的答案有效,但使用eval是不好的做法

要解析类似于JSON/dict的字符串列,请使用以下选项之一(如果可能,最后一个选项最好)。


ast.literal_eval(更好的)

import ast

objects = df2['Class'].apply(ast.literal_eval)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)

#    Id  encourage  contacting  cardinality  subClassOf  get-13.5.1
# 0   4        1.0         1.0          NaN         NaN         NaN
# 1   5        NaN         NaN         16.0         3.0         NaN
# 2   8        NaN         NaN          NaN         NaN         1.0
# 3   9        1.0         NaN         12.0         NaN         NaN
Run Code Online (Sandbox Code Playgroud)

json.loads(更好)

import json

objects = df2['Class'].apply(json.loads)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)

#    encourage  contacting  cardinality  subClassOf  get-13.5.1
# 0        1.0         1.0          NaN         NaN         NaN
# 1        NaN         NaN         16.0         3.0         NaN
# 2        NaN         NaN          NaN         NaN         1.0
# 3        1.0         NaN         12.0         NaN         NaN
Run Code Online (Sandbox Code Playgroud)

如果字符串是单引号,请str.replace在应用之前将它们转换为双引号(从而有效的 JSON)json.loads

objects = df2['Class'].str.replace("'", '"').apply(json.loads)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)
Run Code Online (Sandbox Code Playgroud)

pd.json_normalize 之前 pd.to_csv(推荐)

如果可能,当您最初保存到 CSV 时,只需保存规范化的JSON(而不是原始 JSON 对象):

df1 = df1[['Id']].join(pd.json_normalize(df1['Class']))
df1.to_csv('df1_normalized.csv', index=False, sep=';')

# Id;encourage;contacting;cardinality;subClassOf;get-13.5.1
# 4;1.0;1.0;;;
# 5;;;16.0;3.0;
# 8;;;;;1.0
# 9;1.0;;12.0;;
Run Code Online (Sandbox Code Playgroud)

这是更自然的 CSV 工作流程(而不是存储/加载对象 blob):

df2 = pd.read_csv('df1_normalized.csv', sep=';')

#    Id  encourage  contacting  cardinality  subClassOf  get-13.5.1
# 0   4        1.0         1.0          NaN         NaN         NaN
# 1   5        NaN         NaN         16.0         3.0         NaN
# 2   8        NaN         NaN          NaN         NaN         1.0
# 3   9        1.0         NaN         12.0         NaN         NaN
Run Code Online (Sandbox Code Playgroud)