bjc*_*bjc 26
对 Konstantinos Katsantonis 接受的答案的稍微不那么脏的修改:
import boto3
s3 = boto3.resource('s3') # assumes credentials & configuration are handled outside python in .aws directory or environment variables
def download_s3_folder(bucket_name, s3_folder, local_dir=None):
"""
Download the contents of a folder directory
Args:
bucket_name: the name of the s3 bucket
s3_folder: the folder path in the s3 bucket
local_dir: a relative or absolute directory path in the local file system
"""
bucket = s3.Bucket(bucket_name)
for obj in bucket.objects.filter(Prefix=s3_folder):
target = obj.key if local_dir is None \
else os.path.join(local_dir, os.path.relpath(obj.key, s3_folder))
if not os.path.exists(os.path.dirname(target)):
os.makedirs(os.path.dirname(target))
if obj.key[-1] == '/':
continue
bucket.download_file(obj.key, target)
Run Code Online (Sandbox Code Playgroud)
这也会下载嵌套的子目录。我能够下载一个包含 3000 多个文件的目录。您会在Boto3找到其他解决方案来从 S3 Bucket 下载所有文件,但我不知道它们是否更好。
hum*_*ume 19
对于 S3,您还可以使用cloudpathlibwhich 包装boto3。对于您的用例,这非常简单:
from cloudpathlib import CloudPath
cp = CloudPath("s3://bucket/folder/folder2/")
cp.download_to("local_folder")
Run Code Online (Sandbox Code Playgroud)
快速又肮脏,但是可以工作:
def downloadDirectoryFroms3(bucketName,remoteDirectoryName):
s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket(bucketName)
for object in bucket.objects.filter(Prefix = remoteDirectoryName):
if not os.path.exists(os.path.dirname(object.key)):
os.makedirs(os.path.dirname(object.key))
bucket.download_file(object.key,object.key)
Run Code Online (Sandbox Code Playgroud)
假设您要从s3下载目录foo / bar,则for循环将迭代路径以Prefix = foo / bar开头的所有文件。
使用boto3您可以设置 aws 凭证并从 S3 下载数据集
import boto3
import os
# set aws credentials
s3r = boto3.resource('s3', aws_access_key_id='xxxxxxxxxxxxxxxxx',
aws_secret_access_key='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
bucket = s3r.Bucket('bucket_name')
# downloading folder
prefix = 'dirname'
for object in bucket.objects.filter(Prefix = 'dirname'):
if object.key == prefix:
os.makedirs(os.path.dirname(object.key), exist_ok=True)
continue;
bucket.download_file(object.key, object.key)
Run Code Online (Sandbox Code Playgroud)
如果您找不到 uraccess_key和secret_access_key,请参阅此页面
,
希望它会有所帮助。
谢谢。
另一种方法基于@bjc 的答案,利用内置的 Path 库并为您解析 s3 uri:
import boto3
from pathlib import Path
from urllib.parse import urlparse
def download_s3_folder(s3_uri, local_dir=None):
"""
Download the contents of a folder directory
Args:
s3_uri: the s3 uri to the top level of the files you wish to download
local_dir: a relative or absolute directory path in the local file system
"""
s3 = boto3.resource("s3")
bucket = s3.Bucket(urlparse(s3_uri).hostname)
s3_path = urlparse(s3_uri).path.lstrip('/')
if local_dir is not None:
local_dir = Path(local_dir)
for obj in bucket.objects.filter(Prefix=s3_path):
target = obj.key if local_dir is None else local_dir / Path(obj.key).relative_to(s3_path)
target.parent.mkdir(parents=True, exist_ok=True)
if obj.key[-1] == '/':
continue
bucket.download_file(obj.key, str(target))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8911 次 |
| 最近记录: |