如何使用 python 列出 S3 存储桶文件夹中的文件

Question

如何使用 python 列出 S3 存储桶文件夹中的文件

Car*_*los 6 python amazon-s3 amazon-web-services

我尝试列出存储桶中的所有文件。这是我的代码

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_project')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

Run Code Online (Sandbox Code Playgroud)

有用。我得到了所有文件的名称。但是，当我尝试对文件夹执行相同的操作时，代码会引发错误

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_project/data/') # add the folder name

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

Run Code Online (Sandbox Code Playgroud)

这是错误：

botocore.exceptions.ParamValidationError: Parameter validation failed:

Invalid bucket name "carlos-cryptocurrency-research-project/data/": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"

Run Code Online (Sandbox Code Playgroud)

我确定文件夹名称正确，并尝试将其替换为 Amazon 资源名称 (ARN) 和 S3 URI，但仍然收到错误。

Answer 1

jar*_*mod 13

您不能在 Bucket 构造函数中指定前缀/文件夹。相反，使用客户端级 API 并调用list_objects_v2，如下所示：

import boto3

client = boto3.client('s3')

response = client.list_objects_v2(
    Bucket='my_bucket',
    Prefix='data/')

for content in response.get('Contents', []):
    print(content['Key'])

Run Code Online (Sandbox Code Playgroud)

请注意，这最多会产生 1000 个 S3 对象。如果需要，您可以使用分页器，或者考虑使用更高级别的Bucket资源及其对象集合来为您处理分页，根据此问题的另一个答案。

请注意，这只会列出一页（1000 个对象） (2认同)

Answer 2

thr*_*dhn 5

获取s3 Bucket中特定文件夹中的所有文件列表

import boto3

s3 = boto3.resource('s3')
myBucket = s3.Bucket('bucketName')

for object_summary in myBucket.objects.filter(Prefix="path/"):
    print(object_summary.key)

Run Code Online (Sandbox Code Playgroud)

此处的代码将仅返回 1,000 个键，即使有更多键符合过滤条件。 (2认同)

归档时间：	3 年，11 月前
查看次数：	26603 次
最近记录：	2 年，10 月前