有没有更快的方法将多个文件从 s3 下载到本地文件夹？

Question

有没有更快的方法将多个文件从 s3 下载到本地文件夹？

Jot*_*thi 9 amazon-s3 amazon-web-services boto3 jupyter-notebook python-3.6

我正在尝试使用 jupyter notebook 从 s3 存储桶下载 12,000 个文件，估计在 21 小时内完成下载。这是因为每个文件一次下载一个。我们可以并行进行多次下载，以便加快进程吗？

目前，我正在使用以下代码下载所有文件

### Get unique full-resolution image basenames
images = df['full_resolution_image_basename'].unique()
print(f'No. of unique full-resolution images: {len(images)}')

### Create a folder for full-resolution images
images_dir = './images/'
os.makedirs(images_dir, exist_ok=True)

### Download images
images_str = "','".join(images)
limiting_clause = f"CONTAINS(ARRAY['{images_str}'], 
full_resolution_image_basename)"
_ = download_full_resolution_images(images_dir, 
limiting_clause=limiting_clause)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Die*_*ing 17

请参阅下面的代码。由于 f 字符串（PEP 498），这仅适用于 python 3.6+ 。对旧版本的 python 使用不同的字符串格式化方法。

提供relative_path,bucket_name和s3_object_keys。此外，max_workers 是可选的，如果不提供，则数量将是机器处理器数量的 5 倍。

这个答案的大部分代码来自如何在 Python 中创建异步生成器的答案？库中记录了此示例中的哪些来源。

import boto3
import os
from concurrent import futures


relative_path = './images'
bucket_name = 'bucket_name'
s3_object_keys = [] # List of S3 object keys
max_workers = 5

abs_path = os.path.abspath(relative_path)
s3 = boto3.client('s3')

def fetch(key):
    file = f'{abs_path}/{key}'
    os.makedirs(file, exist_ok=True)  
    with open(file, 'wb') as data:
        s3.download_fileobj(bucket_name, key, data)
    return file


def fetch_all(keys):

    with futures.ThreadPoolExecutor(max_workers=5) as executor:
        future_to_key = {executor.submit(fetch, key): key for key in keys}

        print("All URLs submitted.")

        for future in futures.as_completed(future_to_key):

            key = future_to_key[future]
            exception = future.exception()

            if not exception:
                yield key, future.result()
            else:
                yield key, exception


for key, result in fetch_all(S3_OBJECT_KEYS):
    print(f'key: {key}  result: {result}')

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，7 月前
查看次数：	10677 次
最近记录：	7 年，2 月前