如何使用asyncio下载s3存储桶上的文件

Question

如何使用asyncio下载s3存储桶上的文件

我使用以下代码下载 s3 存储桶中的所有文件：

def main(bucket_name, destination_dir):
    bucket = boto3.resource('s3').Bucket(bucket_name)
    for obj in bucket.objects.all():
        if obj.key.endswith('/'):
            continue
        destination = '%s/%s' % (bucket_name, obj.key)
        if not os.path.exists(destination):
            os.makedirs(os.path.dirname(destination), exist_ok=True)
        bucket.download_file(obj.key, destination)

Run Code Online (Sandbox Code Playgroud)

如果可能的话，我想知道如何使其异步。

提前谢谢你。

Answer 1

Den*_*sky 5

您可以使用generate_presigned_urls3 客户端的方法获取带有 AWS 凭证的 URL（请参阅文档），然后通过异步 HTTP 客户端（例如aiohttp ）发送下载文件的请求

aiohttp 应用 URL 规范化，如果密钥包含空格或非 ASCII 字符，这可能会导致问题。使用URL(..., encoded=True)将解决这个问题。

import boto3
import asyncio
from aiohttp import client
from yarl import URL

bucket = 'some-bucket-name'

s3_client = boto3.client('s3')
s3_objs = s3_client.list_objects(Bucket=bucket)['Contents']

async def download_s3_obj(key: str, aiohttp_session: client.ClientSession):
    request_url = s3_client.generate_presigned_url('get_object', {
        'Bucket': bucket,
        'Key': key
    })

    async with aiohttp_session.get(URL(request_url, encoded=True)) as response:
        file_path = 'some-local-folder-name/' + key.split('/')[-1]

        with open(file_path, 'wb') as file:
            file.write(await response.read())

async def get_tasks():
    session = client.ClientSession()

    return [download_s3_obj(f['Key'], session) for f in s3_objs], session

loop = asyncio.get_event_loop()
tasks, session = loop.run_until_complete(get_tasks())
loop.run_until_complete(asyncio.gather(*tasks))

loop.run_until_complete(session.close())

Run Code Online (Sandbox Code Playgroud)

Answer 2

小智 1

boto3 开箱即用，不支持 asyncio。对此存在一个跟踪问题，提供了一些解决方法；它们可能适用于您的用例，也可能不适用于您的用例。

归档时间：	8 年，2 月前
查看次数：	12079 次
最近记录：	2 年，9 月前