如何使用Boto3下载S3存储桶的最新文件？

Question

如何使用Boto3下载S3存储桶的最新文件？

jz2*_*z22 5 python amazon-s3 boto amazon-web-services boto3

我能找到的其他问题都是指旧版本的Boto.我想下载S3存储桶的最新文件.在文档中我发现有一个方法list_object_versions()可以获得一个布尔值IsLatest.不幸的是,我只设法建立连接并下载文件.你能告诉我如何扩展我的代码以获取存储桶的最新文件吗？谢谢

import boto3
conn = boto3.client('s3',
                    region_name="eu-west-1",
                    endpoint_url="customendpoint",
                    config=Config(signature_version="s3", s3={'addressing_style': 'path'}))

Run Code Online (Sandbox Code Playgroud)

从这里我不知道如何从一个名为的桶中获取最新添加的文件mytestbucket.存储桶中有各种csv文件,但当然所有文件都有不同的名称.

更新:

import boto3
from botocore.client import Config

s3 = boto3.resource('s3', region_name="eu-west-1", endpoint_url="custom endpoint", aws_access_key_id = '1234', aws_secret_access_key = '1234', config=Config(signature_version="s3", s3={'addressing_style': 'path'}))
my_bucket = s3.Bucket('mytestbucket22')
unsorted = []
for file in my_bucket.objects.filter():
   unsorted.append(file)

files = [obj.key for obj in sorted(unsorted, key=get_last_modified, reverse=True)][0:9]

Run Code Online (Sandbox Code Playgroud)

这给了我以下错误:

NameError: name 'get_last_modified' is not defined

Run Code Online (Sandbox Code Playgroud)

Answer 1

mar*_*dev 23

当 s3 存储桶中的对象超过 1000 个时，这会进行处理。这基本上是 @SaadK 的答案，没有 for 循环并使用 list_objects_v2 的较新版本。

\n

编辑：修复了@Timoth\xc3\xa9e-Jeannin 识别的问题。确保识别所有页面的最新内容。

\n

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Paginator.ListObjectsV2

\n

import boto3\n\ndef get_most_recent_s3_object(bucket_name, prefix):\n    s3 = boto3.client(\'s3\')\n    paginator = s3.get_paginator( "list_objects_v2" )\n    page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix)\n    latest = None\n    for page in page_iterator:\n        if "Contents" in page:\n            latest2 = max(page[\'Contents\'], key=lambda x: x[\'LastModified\'])\n            if latest is None or latest2[\'LastModified\'] > latest[\'LastModified\']:\n                latest = latest2\n    return latest\n\nlatest = get_most_recent_s3_object(bucket_name, prefix)\n\nlatest[\'Key\']  # -->   \'prefix/objectname\'\n

Run Code Online (Sandbox Code Playgroud)\n

这样下一个人就不必仔细阅读/检查编辑历史记录：问题已修复 (10认同)

Answer 2

hel*_*loV 8

我提供的答案的变化:Boto3 S3,按最后修改排序桶.您可以修改代码以满足您的需求.

get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))

s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my_bucket')['Contents']
last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified)][0]

Run Code Online (Sandbox Code Playgroud)

如果要反转排序:

[obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)][0]

Run Code Online (Sandbox Code Playgroud)

我想如果要最新添加的文件，应使用[[-1]`修改[[0]]。 (2认同)
@MattBunch是的，如果存储桶中有1000个以上的对象，则需要分页，获取所有对象然后进行排序。 (2认同)

Answer 3

Saa*_*adK 6

如果您有很多文件，那么您需要使用 helloV 提到的分页。我就是这样做的。

get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
paginator = s3.get_paginator( "list_objects" )
page_iterator = paginator.paginate( Bucket = "BucketName", Prefix = "Prefix")
for page in page_iterator:
    if "Contents" in page:
        last_added = [obj['Key'] for obj in sorted( page["Contents"], key=get_last_modified)][-1]

Run Code Online (Sandbox Code Playgroud)

Answer 4

小智 5

你可以做

import boto3

s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='bucket_name', Prefix='prefix')
all = response['Contents']        
latest = max(all, key=lambda x: x['LastModified'])

Run Code Online (Sandbox Code Playgroud)

应该注意的是，这只会显示存储桶中前 1000 个对象中的最新对象。如果您的存储桶包含更多对象，您将需要使用分页器。 (2认同)

Answer 5

Ash*_*han 0

You should be able to download the latest version of the file using default download file command

import boto3
import botocore

BUCKET_NAME = 'mytestbucket'
KEY = 'fileinbucket.txt'

s3 = boto3.resource('s3')

try:
    s3.Bucket(BUCKET_NAME).download_file(KEY, 'downloadname.txt')
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

Run Code Online (Sandbox Code Playgroud)

Reference link

To get the last modified or uploaded file you can use the following

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('myBucket')
unsorted = []
for file in my_bucket.objects.filter():
   unsorted.append(file)

files = [obj.key for obj in sorted(unsorted, key=get_last_modified, 
    reverse=True)][0:9]

Run Code Online (Sandbox Code Playgroud)

As answer in this reference link states, its not the optimal but it works.

归档时间：	8 年，10 月前
查看次数：	13236 次
最近记录：	7 年，6 月前