Fly*_*kle -2 amazon-s3 amazon-web-services jupyter
我有一个 s3 存储桶,其中的文件位于文件夹结构下,就像folder1/folder2
我只想列出文件夹结构下的文件并迭代 Sagemaker Jupyter 笔记本中的文件一样。
我怎样才能实现这个目标?我尝试了用 boto3 列出存储桶内容中的说明,但只能递归地在顶层列出。但我只想在文件夹级别列出。
我也尝试了下面的代码片段
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('bucketname/folder1/folder2')
for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object)
Run Code Online (Sandbox Code Playgroud)
并得到以下错误
ParamValidationError: Parameter validation failed:
Invalid bucket name...
Run Code Online (Sandbox Code Playgroud)
目前使用Python 3.9。谢谢!
这里有几个问题:
bucketname
是存储桶名称folder1/folder2/
是键前缀尝试:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucketname')
for object_summary in bucket.objects.filter(Prefix='folder1/folder2/'):
print(object_summary)
Run Code Online (Sandbox Code Playgroud)
这将导致打印ObjectSummary值列表,例如:
s3.ObjectSummary(bucket_name='bucketname', key='folder1/folder2/')
s3.ObjectSummary(bucket_name='bucketname', key='folder1/folder2/abc.csv')
s3.ObjectSummary(bucket_name='bucketname', key='folder1/folder2/def.csv')
s3.ObjectSummary(bucket_name='bucketname', key='folder1/folder2/xyz.png')
s3.ObjectSummary(bucket_name='bucketname', key='folder1/folder2/folder3/')
Run Code Online (Sandbox Code Playgroud)
请注意,它将包括该folder1/folder2/
级别的所有对象,无论其文件扩展名后缀如何,并且可能包括文件夹本身的指示 ( folder1/folder2/
) 和任何逻辑子文件夹,例如folder1/folder2/folder3/
.
您可以从对象摘要中检索对象,如下所示:
for object_summary in bucket.objects.filter(Prefix="folder1/folder2/"):
print(object_summary.Object().key)
Run Code Online (Sandbox Code Playgroud)
这将导致打印对象键列表,例如:
folder1/folder2/
folder1/folder2/abc.csv
folder1/folder2/def.csv
folder1/folder2/xyz.png
folder1/folder2/folder3/
Run Code Online (Sandbox Code Playgroud)
您可以根据需要过滤这些内容以仅获取 CSV,例如:
summaries = bucket.objects.filter(Prefix="folder1/folder2/")
csvs = [x for x in summaries if x.Object().key.endswith(".csv")]
for objectsummary in csvs:
print(objectsummary.Object().key)
Run Code Online (Sandbox Code Playgroud)
这将导致:
folder1/folder2/abc.csv
folder1/folder2/def.csv
Run Code Online (Sandbox Code Playgroud)
您可以拆分出实际的文件名,如下所示:
for objectsummary in csvs:
print(objectsummary.Object().key.split("/")[-1])
Run Code Online (Sandbox Code Playgroud)
这将导致:
abc.csv
def.csv
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3205 次 |
最近记录: |