如何从多个 API 调用更新 Pandas 数据帧

Question

如何从多个 API 调用更新 Pandas 数据帧

Mal*_*ath 6 python python-3.x pandas json-normalize

我需要做一个python脚本来

读取包含列 ( person_id, name, flag)的 csv 文件。该文件有 3000 行。
基于person_id来自 csv 文件，我需要调用一个 URL 传递person_idGET http://api.myendpoint.intranet/get-data/1234 该 URL 将返回的一些信息person_id，如下例所示。我需要获取所有租金对象并保存在我的 csv 中。我的输出需要是这样的

import pandas as pd
import requests

ids = pd.read_csv(f"{path}/data.csv", delimiter=';')
person_rents = df = pd.DataFrame([], columns=list('person_id','carId','price','rentStatus'))

for id in ids:
    response = request.get(f'endpoint/{id["person_id"]}')
    json = response.json()
    person_rents.append( [person_id, rent['carId'], rent['price'], rent['rentStatus'] ] )
    pd.read_csv(f"{path}/data.csv", delimiter=';' )

Run Code Online (Sandbox Code Playgroud)

person_id;name;flag;cardId;price;rentStatus
1000;Joseph;1;6638;1000;active
1000;Joseph;1;5566;2000;active

Run Code Online (Sandbox Code Playgroud)

响应示例

{
    "active": false,
    "ctodx": false,
    "rents": [{
            "carId": 6638,
            "price": 1000,
            "rentStatus": "active"
        }, {
            "carId": 5566,
            "price": 2000,
            "rentStatus": "active"
        }
    ],
    "responseCode": "OK",
    "status": [{
            "request": 345,
            "requestStatus": "F"
        }, {
            "requestId": 678,
            "requestStatus": "P"
        }
    ],
    "transaction": false
}

Run Code Online (Sandbox Code Playgroud)

在 csv 上保存来自响应的附加数据后，我需要使用 URL 上的 carId 从另一个端点获取数据。里程结果必须保存在同一个csv中。 http://api.myendpoint.intranet/get-mileage/6638 http://api.myendpoint.intranet/get-mileage/5566

每次调用的返回将是这样的

{"mileage":1000.0000}
{"mileage":550.0000}

Run Code Online (Sandbox Code Playgroud)

最终输出必须是

person_id;name;flag;cardId;price;rentStatus;mileage
1000;Joseph;1;6638;1000;active;1000.0000
1000;Joseph;1;5566;2000;active;550.0000

Run Code Online (Sandbox Code Playgroud)

有人可以帮我写这个脚本吗？可以使用 pandas 或任何 python 3 lib。

Answer 1

Tre*_*ney 4

代码说明

创建数据框 ,df与pd.read_csv.
- 预计 , 中的所有值'person_id'都是唯一的。
使用.applyon'person_id'来呼叫prepare_data。
- prepare_data期望'person_id'是 astr或int，如类型注释所示，Union[int, str]
调用API，这将返回dict, 给prepare_data函数。
'rents'将的键转换dict为数据帧，使用pd.json_normalize。
使用.applyon'carId'来调用，API并提取，将其作为列'mileage'添加到 dataframe 中。data
添加'person_id'到data，可用于与df合并s。
pd.Series使用、然后将 ,转换为数据s帧。pd.concatmerge dfsperson_id
pd.to_csv以所需的形式保存到 csv 。

潜在问题

如果出现问题，最有可能出现在函数中call_api。
只要call_api返回 a dict，就像问题中显示的响应一样，代码的其余部分将正确工作以产生所需的输出。

import pandas as pd
import requests
import json
from typing import Union

def call_api(url: str) -> dict:
    r = requests.get(url)
    return r.json()

def prepare_data(uid: Union[int, str]) -> pd.DataFrame:
    
    d_url = f'http://api.myendpoint.intranet/get-data/{uid}'
    m_url = 'http://api.myendpoint.intranet/get-mileage/'
    
    # get the rent data from the api call
    rents = call_api(d_url)['rents']
    # normalize rents into a dataframe
    data = pd.json_normalize(rents)
    
    # get the mileage data from the api call and add it to data as a column
    data['mileage'] = data.carId.apply(lambda cid: call_api(f'{m_url}{cid}')['mileage'])
    # add person_id as a column to data, which will be used to merge data to df
    data['person_id'] = uid
    
    return data
    

# read data from file
df = pd.read_csv('file.csv', sep=';')

# call prepare_data
s = df.person_id.apply(prepare_data)

# s is a Series of DataFrames, which can be combined with pd.concat
s = pd.concat([v for v in s])

# join df with s, on person_id
df = df.merge(s, on='person_id')

# save to csv
df.to_csv('output.csv', sep=';', index=False)

Run Code Online (Sandbox Code Playgroud)

如果运行此代码时出现任何错误：
1. 发表评论，让我知道。
2. 编辑您的问题，然后将整个问题TraceBack作为文本粘贴到代码块中。

例子

# given the following start dataframe
   person_id    name  flag
0       1000  Joseph     1
1        400     Sam     1

# resulting dataframe using the same data for both id 1000 and 400
   person_id    name  flag  carId  price rentStatus  mileage
0       1000  Joseph     1   6638   1000     active   1000.0
1       1000  Joseph     1   5566   2000     active   1000.0
2        400     Sam     1   6638   1000     active   1000.0
3        400     Sam     1   5566   2000     active   1000.0

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，5 月前
查看次数：	1491 次
最近记录：	4 年，9 月前