fpo*_*g01 7 python amazon-s3 amazon-web-services python-3.x boto3
有没有办法按 boto3 中的上次修改日期过滤 s3 对象?我已经构建了一个包含存储桶中所有内容的大型文本文件列表。一段时间过去了,我只想列出上次遍历整个存储桶后添加的对象。
我知道我可以使用该Marker
属性从某个对象名称开始,因此我可以将我在文本文件中处理的最后一个对象提供给它,但这并不能保证在该对象名称之前不会添加新对象。例如,如果文本文件中的最后一个文件是 Oak.txt 并且添加了一个名为 apple.txt 的新文件,它就不会选择它。
s3_resource = boto3.resource('s3')
client = boto3.client('s3')
def list_rasters(bucket):
bucket = s3_resource.Bucket(bucket)
for bucket_obj in bucket.objects.filter(Prefix="testing_folder/"):
print bucket_obj.key
print bucket_obj.last_modified
Run Code Online (Sandbox Code Playgroud)
Ami*_*nes 15
以下代码片段获取特定文件夹下的所有对象,并检查上次修改的文件是否在您指定的时间之后创建:
替换YEAR,MONTH, DAY
为您的值。
import boto3
import datetime
#bucket Name
bucket_name = 'BUCKET NAME'
#folder Name
folder_name = 'FOLDER NAME'
#bucket Resource
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)
def lambda_handler(event, context):
for file in bucket.objects.filter(Prefix= folder_name):
#compare dates
if (file.last_modified).replace(tzinfo = None) > datetime.datetime(YEAR,MONTH, DAY,tzinfo = None):
#print results
print('File Name: %s ---- Date: %s' % (file.key,file.last_modified))
Run Code Online (Sandbox Code Playgroud)
小智 5
下面的代码片段将使用 s3 对象类 get() 操作仅返回满足 IfModifiedSince 日期时间参数的内容。该脚本打印文件,这是原来的问题,但也将文件保存在本地。
import boto3
import io
from datetime import date, datetime, timedelta
# Defining AWS S3 resources
s3 = boto3.resource('s3')
bucket = s3.Bucket('<bucket_name>')
prefix = '<object_key_prefix, if any>'
# note this based on UTC time
yesterday = datetime.fromisoformat(str(date.today() - timedelta(days=1)))
# function to retrieve Streaming Body from S3 with timedelta argument
def get_object(file_name):
try:
obj = file_name.get(IfModifiedSince=yesterday)
return obj['Body']
except:
False
# obtain a list of s3 Objects with prefix filter
files = list(bucket.objects.filter(Prefix=prefix))
# Iterating through the list of files
# Loading streaming body into a file with the same name
# Printing file name and saving file
# Note skipping first file since it's only the directory
for file in files[1:]:
file_name = file.key.split(prefix)[1] # getting the file name of the S3 object
s3_file = get_object(file) # streaming body needing to iterate through
if s3_file: # meets the modified by date
print(file_name) # prints files not modified since timedelta
try:
with io.FileIO(file_name, 'w') as f:
for i in s3_file: # iterating though streaming body
f.write(i)
except TypeError as e:
print(e, file)
Run Code Online (Sandbox Code Playgroud)
这是一个更优化的解决方案,用于获取按字段过滤的对象键LastModified
。
s3 = boto3.client("s3")
s3_paginator = s3.get_paginator('list_objects_v2')
s3_iterator = s3_paginator.paginate(Bucket="SampleBucket")
filtered_iterator = s3_iterator.search(
"Contents[?to_string(LastModified)>='\"2023-03-01 00:00:00+00:00\"'].Key"
)
for key_data in filtered_iterator:
print(key_data)
Run Code Online (Sandbox Code Playgroud)
您可以修改迭代器搜索字符串以获取必要的字段。
归档时间: |
|
查看次数: |
10112 次 |
最近记录: |