Wil*_*iam 5 json normalize dataframe pandas
背景:我正在尝试标准化 json 文件,并将其保存到 pandas 数据框中,但是我在导航 json 结构时遇到问题,并且我的代码无法按预期工作。
预期的数据帧输出:给定以下示例json文件(使用随机数据,但格式与真实数据完全相同),这是我试图生成的输出 -
| 新实体组 | 实体ID | 调整后的价值 (2022 年 1 月 31 日,无 Div,美元) |
调整后的 TWR (本季度无 Div,美元)) |
调整后的 TWR (年初至今,无 Div,美元) |
年化调整后 TWR (自成立以来,无 Div,美元) |
成立日期 | 风险目标 |
|---|---|---|---|---|---|---|---|
| 作品集_1 | $260,786 | (44.55%) | (44.55%) | (44.55%) * | 2021 年 4 月 7 日 | 不适用 | |
| FW Irrev 家族Tr | 9552252 | $260,786 | 0.00% | 0.00% | 0.00% * | 2022 年 1 月 11 日 | 不适用 |
| 作品集_2 | $18,396,664 美元 | (5.78%) | (5.78%) | (5.47%) * | 2021 年 9 月 3 日 | 生长 | |
| 正向气浮 | 10946585 | $18,396,664 美元 | (5.78%) | (5.78%) | (5.47%) * | 2021 年 9 月 3 日 | 生长 |
| 作品集_3 | $60,143,818 | (4.42%) | (4.42%) | 7.75% * | 2020 年 12 月 17 日 | - | |
| FW家族信托 | 13014080 | $475,356 | (6.10%) | (6.10%) | (3.97%) * | 2021 年 4 月 9 日 | 挑衅的 |
| FW流动资金有限合伙人 | 13396796 | $52,899,527 美元 | (4.15%) | (4.15%) | (4.15%) * | 2021 年 12 月 30 日 | 挑衅的 |
| FW 控股第二有限责任公司 | 8413655 | 6,768,937 美元 | (0.77%) | (0.77%) | 11.84% * | 2021 年 3 月 5 日 | 不适用 |
| FW 和 FR 接头 | 9957007 | (1 美元) | - | - | - * | 2021 年 12 月 21 日 | 不适用 |
实际数据帧输出:尽管我尽了最大努力,但我只能将粗体行映射到数据帧中:
| 新实体组 | 实体ID | 调整后的价值 (2022 年 1 月 31 日,无 Div,美元) |
调整后的 TWR (本季度无 Div,美元)) |
调整后的 TWR (年初至今,无 Div,美元) |
年化调整后 TWR (自成立以来,无 Div,美元) |
成立日期 | 风险目标 |
|---|---|---|---|---|---|---|---|
| 作品集_1 | $260,786 | (44.55%) | (44.55%) | (44.55%) * | 2021 年 4 月 7 日 | 不适用 | |
| 作品集_2 | $18,396,664 美元 | (5.78%) | (5.78%) | (5.47%) * | 2021 年 9 月 3 日 | 生长 | |
| 作品集_3 | $60,143,818 | (4.42%) | (4.42%) | 7.75% * | 2020 年 12 月 17 日 | - |
JSON 文件:这是我尝试标准化并映射到数据帧的文件:
{
"meta": {
"columns": [
{
"key": "node_id",
"display_name": "Entity ID",
"output_type": "Word"
},
{
"key": "value",
"display_name": "Adjusted Value (1/31/2022, No Div, USD)",
"output_type": "Number",
"currency": "USD"
},
{
"key": "time_weighted_return",
"display_name": "Adjusted TWR (Current Quarter, No Div, USD)",
"output_type": "Percent",
"currency": "USD"
},
{
"key": "time_weighted_return_2",
"display_name": "Adjusted TWR (YTD, No Div, USD)",
"output_type": "Percent",
"currency": "USD"
},
{
"key": "time_weighted_return_3",
"display_name": "Annualized Adjusted TWR (Since Inception, No Div, USD)",
"output_type": "Percent",
"currency": "USD"
},
{
"key": "inception_event_date",
"display_name": "Inception Date",
"output_type": "Date"
},
{
"key": "_custom_portfolio_target_347209",
"display_name": "Risk Target",
"output_type": "Word"
}
],
"groupings": [
{
"key": "_custom_new_entity_group_453577",
"display_name": "NEW Entity Group"
},
{
"key": "top_level_legal_entity",
"display_name": "Top Level Legal Entity"
}
]
},
"data": {
"type": "portfolio_views",
"attributes": {
"total": {
"name": "Total",
"columns": {
"time_weighted_return": -0.05001974888806926,
"inception_event_date": "2020-12-17",
"_custom_portfolio_target_347209": null,
"time_weighted_return_3": 0.0678647066340392,
"time_weighted_return_2": -0.05001974888806926,
"value": 7.880126780581851E7,
"node_id": null
},
"children": [
{
"name": "Portfolio_3",
"grouping": "_custom_new_entity_group_453577",
"columns": {
"time_weighted_return": -0.04420061615233983,
"inception_event_date": "2020-12-17",
"_custom_portfolio_target_347209": null,
"time_weighted_return_3": 0.07748325432684622,
"time_weighted_return_2": -0.04420061615233983,
"value": 6.014381761929752E7,
"node_id": null
},
"children": [
{
"entity_id": 9957007,
"name": "FW and FR Joint",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": null,
"inception_event_date": "2021-12-21",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": null,
"time_weighted_return_2": null,
"value": -1.44,
"node_id": "9957007"
},
"children": []
},
{
"entity_id": 8413655,
"name": "FW Holdings No. 2 LLC",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.0077309266066708515,
"inception_event_date": "2021-03-05",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": 0.11844843557716445,
"time_weighted_return_2": -0.0077309266066708515,
"value": 6768936.74,
"node_id": "8413655"
},
"children": []
},
{
"entity_id": 13396796,
"name": "FW Liquid Fund LP",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.04149769229150746,
"inception_event_date": "2021-12-30",
"_custom_portfolio_target_347209": "Aggressive",
"time_weighted_return_3": -0.041497430478377395,
"time_weighted_return_2": -0.04149769229150746,
"value": 5.289952672686747E7,
"node_id": "13396796"
},
"children": []
},
{
"entity_id": 13014080,
"name": "The FW Family Trust",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.06102013456998856,
"inception_event_date": "2021-04-09",
"_custom_portfolio_target_347209": "Aggressive",
"time_weighted_return_3": -0.039685671858585514,
"time_weighted_return_2": -0.06102013456998856,
"value": 475355.59242999996,
"node_id": "13014080"
},
"children": []
}
]
},
{
"name": "Portfolio_1",
"grouping": "_custom_new_entity_group_453577",
"columns": {
"time_weighted_return": -0.44554958179309,
"inception_event_date": "2021-04-07",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": -0.44554958179309,
"time_weighted_return_2": -0.44554958179309,
"value": 260786.03,
"node_id": null
},
"children": [
{
"entity_id": 9552252,
"name": "The FW Irrev Family Tr",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": 0.0,
"inception_event_date": "2022-01-11",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": 0.0,
"time_weighted_return_2": 0.0,
"value": 260786.03,
"node_id": "9552252"
},
"children": []
}
]
},
{
"name": "Portfolio_2",
"grouping": "_custom_new_entity_group_453577",
"columns": {
"time_weighted_return": -0.05780354507057972,
"inception_event_date": "2021-09-03",
"_custom_portfolio_target_347209": "Growth",
"time_weighted_return_3": -0.05470214863844658,
"time_weighted_return_2": -0.05780354507057972,
"value": 1.8396664156520825E7,
"node_id": null
},
"children": [
{
"entity_id": 10946585,
"name": "FW DAF",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.05780354507057972,
"inception_event_date": "2021-09-03",
"_custom_portfolio_target_347209": "Growth",
"time_weighted_return_3": -0.05470214863844658,
"time_weighted_return_2": -0.05780354507057972,
"value": 1.8396664156520832E7,
"node_id": "10946585"
},
"children": []
}
]
}
]
}
}
},
"included": []
}
Run Code Online (Sandbox Code Playgroud)
我的代码:这是我构建的函数,用于尝试规范化 JSON 响应并保存在 pandas 数据框中 -
def unpack_response():
while True:
try:
api_response = response_writer()
df = pd.json_normalize(api_response['data']['attributes']['total']['children'])
df.columns = df.columns.str.replace(r'columns.', '', regex=False)
column_name_mapper = {column['key']: column['display_name'] for column in api_response['meta']['columns']}
df.rename(columns=column_name_mapper, inplace=True)
break
except KeyError:
print("-----------------------------------\n","API TIMEOUT ERROR: TRYING AGAIN...", "\n-----------------------------------\n")
df.rename(columns={'name': 'New Entity Group'}, inplace=True)
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date"]
df = df.reindex(columns=column_names)
return df
unpack_response()
Run Code Online (Sandbox Code Playgroud)
评论我的代码:
children,data并且似乎是唯一保存到df. 我认为这是因为我的代码引用df = pd.json_normalize(api_response['data']['attributes']['total']['children'])所以只查看这些列表。我尝试只是附加['children']['children']到该代码片段的末尾(假设有 3x 级别children,但收到了TypeError: list indices must be integers or slices, not str.我将不胜感激任何关于如何改进或添加我的功能的建议,这样我就可以利用 key:pair 值,这是级别的 2 倍children。
小智 1
就我个人而言,我不会用于pd.json_normalize这种情况。您的 JSON 非常复杂,除非您真正有使用 的经验json_normalize,否则对于普通开发人员来说,以下代码可能需要更少的时间来理解。事实上,您甚至不需要查看 JSON 就可以准确理解这段代码的作用(尽管它肯定会有所帮助;)。
首先,我们可以将 JSON 中的对象(投资组合及其子项)提取到列表中,并使用一系列步骤使它们保持正确的形式和顺序:
\ndef prep_obj(o):\n """Prepares an object (portfolio/child) from the JSON to be inserted into a dataframe."""\n return {\n 'New Entity Group': o['name'],\n } | o['columns']\n\n\n# Get a list of lists, where each sub-list contains the portfolio object at index 0 and then the portfolio object's children:\ngroups = [[prep_obj(o), *[prep_obj(child) for child in o['children']]] for o in api_response['data']['attributes']['total']['children']]\n\n# Sort the portfolio groups by their number:\ngroups.sort(key=lambda g: int(g[0]['New Entity Group'].split('_')[1]))\n\n# Reverse the children of each portfolio group:\ngroups = [[g[0]] + g[1:][::-1] for g in groups]\n\n# Flatten out the groups into one large list of objects:\nobjects = [obj for group in groups for obj in group]\n# The above is exactly equivalent to the following:\n# objects = []\n# for group in groups:\n# for obj in group:\n# objects.append(obj)\nRun Code Online (Sandbox Code Playgroud)\n接下来,创建数据框:
\n# Create a mapping for column names so that their display names can be used:\nmapping = {col['key']: col['display_name'] for col in api_response['meta']['columns']}\n\n# Create a dataframe from the list of objects:\ndf = pd.DataFrame(objects)\n\n# Correct column names:\ndf = df.rename(mapping, axis=1)\n# Reorder columns:\ncolumn_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]\ndf = df[column_names]\nRun Code Online (Sandbox Code Playgroud)\n并格式化:
\ndef format_twr_col(col):\n return (\n col\n .abs()\n .mul(100)\n .round(2)\n .pipe(lambda s: s.where(s.eq(0) | s.isna(), '(' + s.astype(str) + '%)'))\n .pipe(lambda s: s.where(s.ne(0) | s.isna(), s.astype(str) + '%'))\n .fillna('-')\n )\n\ndef format_value_col(col):\n positive_mask = col.ge(0)\n\n col[positive_mask] = (\n col[positive_mask]\n .round()\n .astype(int)\n .map('${:,}'.format)\n )\n\n col[~positive_mask] = (\n col[~positive_mask]\n .astype(float)\n .round()\n .astype(int)\n .abs()\n .map('(${:,})'.format)\n )\n \n return col\n\ndf['Adjusted TWR (Current Quarter, No Div, USD)'] = format_twr_col(df['Adjusted TWR (Current Quarter, No Div, USD)'])\ndf['Annualized Adjusted TWR (Since Inception, No Div, USD)'] = format_twr_col(df['Annualized Adjusted TWR (Since Inception, No Div, USD)'])\ndf['Adjusted TWR (YTD, No Div, USD)'] = format_twr_col(df['Adjusted TWR (YTD, No Div, USD)'])\n\ndf['Adjusted Value (1/31/2022, No Div, USD)'] = format_value_col(df['Adjusted Value (1/31/2022, No Div, USD)'].copy())\n\ndf['Inception Date'] = pd.to_datetime(df['Inception Date']).dt.strftime('%b %d, %Y')\n\ndf['Entity ID'] = df['Entity ID'].fillna('')\nRun Code Online (Sandbox Code Playgroud)\n还有...瞧\xc3\xa0:
\n>>> pd.options.display.max_columns = None\n>>> df\n New Entity Group Entity ID Adjusted Value (1/31/2022, No Div, USD) Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD) Annualized Adjusted TWR (Since Inception, No Div, USD) Inception Date Risk Target\n0 Portfolio_1 $260,786 (44.55%) (44.55%) (44.55%) Apr 07, 2021 N/A\n1 The FW Irrev Family Tr 9552252 $260,786 0.0% 0.0% 0.0% Jan 11, 2022 N/A\n2 Portfolio_2 $18,396,664 (5.78%) (5.78%) (5.47%) Sep 03, 2021 Growth\n3 FW DAF 10946585 $18,396,664 (5.78%) (5.78%) (5.47%) Sep 03, 2021 Growth\n4 Portfolio_3 $60,143,818 (4.42%) (4.42%) (7.75%) Dec 17, 2020 NaN\n5 The FW Family Trust 13014080 $475,356 (6.1%) (6.1%) (3.97%) Apr 09, 2021 Aggressive\n6 FW Liquid Fund LP 13396796 $52,899,527 (4.15%) (4.15%) (4.15%) Dec 30, 2021 Aggressive\n7 FW Holdings No. 2 LLC 8413655 $6,768,937 (0.77%) (0.77%) (11.84%) Mar 05, 2021 N/A\n8 FW and FR Joint 9957007 ($1) - - - Dec 21, 2021 N/A\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
8219 次 |
| 最近记录: |