使用 requests 和 psycopg2 在 Postgres 中创建/插入 Json

Mic*_*son 1 python postgresql json

刚刚开始一个项目PostgreSQL。我想从 Excel 跳转到数据库,但我一直停留在创建和插入上。一旦我运行这个,我相信我将不得不将其切换到更新,这样我就不会继续写入当前数据。我知道我的连接正常,但出现以下错误。

我的错误是:TypeError: not all arguments converted during string formatting

#!/usr/bin/env python
import requests
import psycopg2

conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')

req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018') 
data = req.json()['data']

my_data = []
for item in data:
    season = item['seasonId']
    player = item['playerName']
    first_name = item['playerFirstName']
    last_Name = item['playerLastName']
    playerId = item['playerId']
    height = item['playerHeight']
    pos = item['playerPositionCode']
    handed = item['playerShootsCatches']
    city = item['playerBirthCity']
    country = item['playerBirthCountry']   
    state = item['playerBirthStateProvince']
    dob = item['playerBirthDate']
    draft_year = item['playerDraftYear']
    draft_round = item['playerDraftRoundNo']
    draft_overall = item['playerDraftOverallPickNo']
    my_data.append([playerId, player, first_name, last_Name, height, pos, handed, city, country, state, dob, draft_year, draft_round, draft_overall, season])

cur = conn.cursor()
cur.execute("CREATE TABLE t_skaters (data json);")
cur.executemany("INSERT INTO t_skaters VALUES (%s)", (my_data,))
Run Code Online (Sandbox Code Playgroud)

样本data:

[[8468493, 'Ron Hainsey', 'Ron', 'Hainsey', 75, 'D', 'L', 'Bolton', 'USA', 'CT', '1981-03-24', 2000, 1, 13, 20172018], [8471339, 'Ryan Callahan', 'Ryan', 'Callahan', 70, 'R', 'R', 'Rochester', 'USA', 'NY', '1985-03-21', 2004, 4, 127, 20172018]]
Run Code Online (Sandbox Code Playgroud)

pau*_*ult 5

您似乎想创建一个包含名为 的列的表"data"。该列的类型是 JSON。(我建议每个字段创建一列,但这取决于您。)

在这种情况下,变量data(从请求中读取)是 a listof dicts。正如我在评论中提到的,您可以循环data并一次插入一个,这executemany()并不比多次调用execute().

我所做的如下:

  1. 创建您关心的字段列表。
  2. 循环遍历以下元素data
  3. 对于每个itemin data,将字段提取到my_data
  4. 调用execute()并传入json.dumps(my_data)my_data从 a转换dict为 JSON 字符串)

尝试这个:

#!/usr/bin/env python
import requests
import psycopg2
import json

conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')

req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018') 

# data here is a list of dicts
data = req.json()['data']

cur = conn.cursor()
# create a table with one column of type JSON
cur.execute("CREATE TABLE t_skaters (data json);")

fields = [
    'seasonId',
    'playerName',
    'playerFirstName',
    'playerLastName',
    'playerId',
    'playerHeight',
    'playerPositionCode',
    'playerShootsCatches',
    'playerBirthCity',
    'playerBirthCountry',
    'playerBirthStateProvince',
    'playerBirthDate',
    'playerDraftYear',
    'playerDraftRoundNo',
    'playerDraftOverallPickNo'
]

for item in data:
    my_data = {field: item[field] for field in fields}
    cur.execute("INSERT INTO t_skaters VALUES (%s)", (json.dumps(my_data),))


# commit changes
conn.commit()
# Close the connection
conn.close()
Run Code Online (Sandbox Code Playgroud)

我不能 100% 确定这里所有的 postgres 语法是否正确(我无法访问 PG 数据库进行测试),但我相信这个逻辑应该适用于您想要做的事情。

单独列的更新

您可以修改创建语句以处理多个列,但这需要知道每个列的数据类型。您可以遵循以下一些伪代码:

# same boilerplate code from above
cur = conn.cursor()
# create a table with one column per field
cur.execute(
"""CREATE TABLE t_skaters (seasonId INTEGER, playerName VARCHAR, ...);"""
)

fields = [
    'seasonId',
    'playerName',
    'playerFirstName',
    'playerLastName',
    'playerId',
    'playerHeight',
    'playerPositionCode',
    'playerShootsCatches',
    'playerBirthCity',
    'playerBirthCountry',
    'playerBirthStateProvince',
    'playerBirthDate',
    'playerDraftYear',
    'playerDraftRoundNo',
    'playerDraftOverallPickNo'
]

for item in data:
    my_data = [item[field] for field in fields]
    # need a placeholder (%s) for each variable 
    # refer to postgres docs on INSERT statement on how to specify order
    cur.execute("INSERT INTO t_skaters VALUES (%s, %s, ...)", tuple(my_data))


# commit changes
conn.commit()
# Close the connection
conn.close()
Run Code Online (Sandbox Code Playgroud)

将 替换...为适合您的数据的值。