我是否正确地将 aiohttp 与 psycopg2 一起使用？

Question

我是否正确地将 aiohttp 与 psycopg2 一起使用？

我对使用 asyncio/aiohttp 很陌生，但我有一个 Python 脚本，它从 Postgres 表中读取一批 URL:s，下载 URL:s，在每次下载时运行处理函数（与问题无关），并将处理结果存回到表中。

简化形式如下所示：

import asyncio
import psycopg2
from aiohttp import ClientSession, TCPConnector

BATCH_SIZE = 100

def _get_pgconn():
    return psycopg2.connect()

def db_conn(func):
    def _db_conn(*args, **kwargs):
        with _get_pgconn() as conn:
            with conn.cursor() as cur:
                return func(cur, *args, **kwargs)
            conn.commit()
    return _db_conn

async def run():
    async with ClientSession(connector=TCPConnector(ssl=False, limit=100)) as session:
        while True:
            count = await run_batch(session)
            if count == 0:
                break

async def run_batch(session):
    tasks = []
    for url in get_batch():
        task = asyncio.ensure_future(process_url(url, session))
        tasks.append(task)

    await asyncio.gather(*tasks)
    results = [task.result() for task in tasks]
    save_batch_result(results)
    return len(results)

async def process_url(url, session):
    try:
        async with session.get(url, timeout=15) as response:
            body = await response.read()
            return process_body(body)
    except:
        return {...}

@db_conn
def get_batch(cur):
    sql = "SELECT id, url FROM db.urls WHERE processed IS NULL LIMIT %s"
    cur.execute(sql, (BATCH_SIZE,))
    return cur.fetchall()


@db_conn
def save_batch_result(cur, results):
    sql = "UPDATE db.urls SET a = %(a)s, processed = true WHERE id = %(id)s"
    cur.executemany(sql, tuple(results))


loop = asyncio.get_event_loop()
loop.run_until_complete(run())

Run Code Online (Sandbox Code Playgroud)

但我有一种感觉，我一定在这里错过了一些东西。该脚本运行，但似乎每批都变得越来越慢。特别是，随着时间的推移，对该process_url函数的调用似乎变得越来越慢。此外，使用的内存不断增长，所以我猜测在运行之间可能有一些东西我无法正确清理？

我也遇到了大幅增加批处理大小的问题，如果我超过 200，我似乎会从调用session.get. 我尝试过使用limitTCPConnector 的参数，将其设置得更高和更低，但我看不出它有多大帮助。也尝试过在几个不同的服务器上运行它，但似乎是相同的。有没有什么方法可以思考如何更有效地设置这些值？

如果有人指出我在这里可能做错了什么，我将不胜感激！

Answer 1

And*_*lov 5

您的代码的问题是将异步aiohttp库与同步 psycopg2客户端混合在一起。

因此，对 DB 的调用会阻塞事件循环，从而完全影响所有其他并行任务。

要解决这个问题，您需要使用异步数据库客户端：aiopg （ psycopg2异步模式的包装器）或asyncpg（它有不同的API，但工作速度更快）。

归档时间：	7 年，6 月前
查看次数：	2604 次
最近记录：	7 年，5 月前