jz2*_*z22 5 python amazon-s3 boto amazon-web-services boto3
我能找到的其他问题都是指旧版本的Boto.我想下载S3存储桶的最新文件.在文档中我发现有一个方法list_object_versions()可以获得一个布尔值IsLatest.不幸的是,我只设法建立连接并下载文件.你能告诉我如何扩展我的代码以获取存储桶的最新文件吗?谢谢
import boto3
conn = boto3.client('s3',
region_name="eu-west-1",
endpoint_url="customendpoint",
config=Config(signature_version="s3", s3={'addressing_style': 'path'}))
Run Code Online (Sandbox Code Playgroud)
从这里我不知道如何从一个名为的桶中获取最新添加的文件mytestbucket.存储桶中有各种csv文件,但当然所有文件都有不同的名称.
更新:
import boto3
from botocore.client import Config
s3 = boto3.resource('s3', region_name="eu-west-1", endpoint_url="custom endpoint", aws_access_key_id = '1234', aws_secret_access_key = '1234', config=Config(signature_version="s3", s3={'addressing_style': 'path'}))
my_bucket = s3.Bucket('mytestbucket22')
unsorted = []
for file in my_bucket.objects.filter():
unsorted.append(file)
files = [obj.key for obj in sorted(unsorted, key=get_last_modified, reverse=True)][0:9]
Run Code Online (Sandbox Code Playgroud)
这给了我以下错误:
NameError: name 'get_last_modified' is not defined
Run Code Online (Sandbox Code Playgroud)
mar*_*dev 23
当 s3 存储桶中的对象超过 1000 个时,这会进行处理。这基本上是 @SaadK 的答案,没有 for 循环并使用 list_objects_v2 的较新版本。
\n编辑:修复了@Timoth\xc3\xa9e-Jeannin 识别的问题。确保识别所有页面的最新内容。
\n\nimport boto3\n\ndef get_most_recent_s3_object(bucket_name, prefix):\n s3 = boto3.client(\'s3\')\n paginator = s3.get_paginator( "list_objects_v2" )\n page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix)\n latest = None\n for page in page_iterator:\n if "Contents" in page:\n latest2 = max(page[\'Contents\'], key=lambda x: x[\'LastModified\'])\n if latest is None or latest2[\'LastModified\'] > latest[\'LastModified\']:\n latest = latest2\n return latest\n\nlatest = get_most_recent_s3_object(bucket_name, prefix)\n\nlatest[\'Key\'] # --> \'prefix/objectname\'\nRun Code Online (Sandbox Code Playgroud)\n
我提供的答案的变化:Boto3 S3,按最后修改排序桶.您可以修改代码以满足您的需求.
get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my_bucket')['Contents']
last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified)][0]
Run Code Online (Sandbox Code Playgroud)
如果要反转排序:
[obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)][0]
Run Code Online (Sandbox Code Playgroud)
如果您有很多文件,那么您需要使用 helloV 提到的分页。我就是这样做的。
get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
paginator = s3.get_paginator( "list_objects" )
page_iterator = paginator.paginate( Bucket = "BucketName", Prefix = "Prefix")
for page in page_iterator:
if "Contents" in page:
last_added = [obj['Key'] for obj in sorted( page["Contents"], key=get_last_modified)][-1]
Run Code Online (Sandbox Code Playgroud)
小智 5
你可以做
import boto3
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='bucket_name', Prefix='prefix')
all = response['Contents']
latest = max(all, key=lambda x: x['LastModified'])
Run Code Online (Sandbox Code Playgroud)
You should be able to download the latest version of the file using default download file command
import boto3
import botocore
BUCKET_NAME = 'mytestbucket'
KEY = 'fileinbucket.txt'
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'downloadname.txt')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Run Code Online (Sandbox Code Playgroud)
Reference link
To get the last modified or uploaded file you can use the following
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('myBucket')
unsorted = []
for file in my_bucket.objects.filter():
unsorted.append(file)
files = [obj.key for obj in sorted(unsorted, key=get_last_modified,
reverse=True)][0:9]
Run Code Online (Sandbox Code Playgroud)
As answer in this reference link states, its not the optimal but it works.
| 归档时间: |
|
| 查看次数: |
13236 次 |
| 最近记录: |