我手动创建一个 DataFrame:
import pandas as pd
df_articles1 = pd.DataFrame({'Id' : [4,5,8,9],
'Class':[
{'encourage': 1, 'contacting': 1},
{'cardinality': 16, 'subClassOf': 3},
{'get-13.5.1': 1},
{'cardinality': 12, 'encourage': 1}
]
})
Run Code Online (Sandbox Code Playgroud)
我将其导出到 csv 文件以在拆分后导入:
df_articles1.to_csv(f"""{path}articles_split.csv""", index = False, sep=";")
Run Code Online (Sandbox Code Playgroud)
我可以将其拆分为pd.json_normalize():
df_articles1 = pd.json_normalize(df_articles1['Class'])
Run Code Online (Sandbox Code Playgroud)
我将其 csv 文件导入到 DataFrame 中:
df_articles2 = pd.read_csv(f"""{path}articles_split.csv""", sep=";")
Run Code Online (Sandbox Code Playgroud)
但这失败了:
AttributeError: 'str' 对象没有属性 'values' pd.json_normalize(df_articles2['Class'])
我有一个熊猫数据框 df:
<bound method NDFrame.head of DAT_RUN DAT_FORECAST LIB_SOURCE MES_LONGITUDE MES_LATITUDE MES_TEMPERATURE MES_HUMIDITE MES_PLUIE MES_VITESSE_VENT MES_U_WIND MES_V_WIND
0 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 3.75 11.994824 72.0 0.0 2.653137 -2.402910 -1.124792
1 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 4.00 13.094824 74.3 0.0 2.976434 -2.972910 -0.144792
2 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 4.25 12.594824 75.3 0.0 3.128418 -2.702910 1.575208
3 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 4.50 12.094824 75.5 0.0 3.183418 -2.342910 2.155208
Run Code Online (Sandbox Code Playgroud)
我将 DAT_RUN 和 DAT_FORECAST 列转换为日期时间格式:
df["DAT_RUN"] = pd.to_datetime(df['DAT_RUN'], format="%Y-%m-%dT%H:%M:%SZ") # previously "%Y-%m-%d %H:%M:%S"
df["DAT_FORECAST"] = …Run Code Online (Sandbox Code Playgroud)