Python pandas:通过代理键将 JSON 扁平化为行的快速方法

Fli*_*rPA 8 python flatten dataframe pandas

我对诸如此类的包的了解pandas相当浅,我一直在寻找将数据展平为行的解决方案。有了dict这样的,有一个代理键名为entry_id

data = [
    {
        "id": 1,
        "entry_id": 123,
        "type": "ticker",
        "value": "IBM"
    },
    {
        "id": 2,
        "entry_id": 123,
        "type": "company_name",
        "value": "International Business Machines"
    },
    {
        "id": 3,
        "entry_id": 123,
        "type": "cusip",
        "value": "01234567"
    },
    {
        "id": 4,
        "entry_id": 321,
        "type": "ticker",
        "value": "AAPL"
    },
    {
        "id": 5,
        "entry_id": 321,
        "type": "permno",
        "value": "123456"
    },
    {
        "id": 6,
        "entry_id": 321,
        "type": "company_name",
        "value": "Apple, Inc."
    },
    {
        "id": 7,
        "entry_id": 321,
        "type": "formation_date",
        "value": "1976-04-01"
    }
]
Run Code Online (Sandbox Code Playgroud)

我想将数据展平为由代理键分组的行,entry_id如下所示(空字符串或None值,无关紧要):

[
    {"entry_id": 123, "ticker": "IBM", "permno": "", "company_name": "International Business Machines", "cusip": "01234567", "formation_date": ""},
    {"entry_id": 321, "ticker": "AAPL", "permno": "123456", "company_name": "Apple, Inc", "cusip": "", "formation_date": "1976-04-01"}
]
Run Code Online (Sandbox Code Playgroud)

我已经尝试使用 DataFrame 的groupbyand json_normalize,但无法获得正确的魔法水平以获得所需的结果。我可以用纯 Python 处理数据,但我确信这不是一个快速的解决方案。我不确定如何指定它type是列,value是值,entry_id还是聚合键。我对其他包裹持开放态度pandas

Shu*_*rma 11

我们可以从给定的记录列表中创建一个数据框,然后pivot是要重塑的数据框,带有空字符串fillNaN值,然后将旋转的框转换为字典

df = pd.DataFrame(data)
df.pivot('entry_id', 'type', 'value').fillna('').reset_index().to_dict('r')
Run Code Online (Sandbox Code Playgroud)
[{'entry_id': 123,
  'company_name': 'International Business Machines',
  'cusip': '01234567',
  'formation_date': '',
  'permno': '',
  'ticker': 'IBM'},
 {'entry_id': 321,
  'company_name': 'Apple, Inc.',
  'cusip': '',
  'formation_date': '1976-04-01',
  'permno': '123456',
  'ticker': 'AAPL'}]
Run Code Online (Sandbox Code Playgroud)