如果远程文件比本地副本更新,如何使用boto从S3下载文件?

tul*_*ler 12 python amazon-s3 boto

我正在尝试使用boto从S3下载文件,但前提是该文件的本地副本比远程文件旧.

我正在使用标题'If-Modified-Since'和下面的代码:

#!/usr/bin/python
import os
import datetime
import boto
from boto.s3.key import Key

bucket_name = 'my-bucket'

conn = boto.connect_s3()
bucket = conn.get_bucket(bucket_name)

def download(bucket, filename):
    key = Key(bucket, filename)
    headers = {}
    if os.path.isfile(filename):
        print "File exists, adding If-Modified-Since header"
        modified_since = os.path.getmtime(filename)
        timestamp = datetime.datetime.utcfromtimestamp(modified_since)
        headers['If-Modified-Since'] = timestamp.strftime("%a, %d %b %Y %H:%M:%S GMT")
    try:
        key.get_contents_to_filename(filename, headers)
    except boto.exception.S3ResponseError as e:
        return 304
    return 200

print download(bucket, 'README')
Run Code Online (Sandbox Code Playgroud)

问题是当本地文件不存在时一切正常并且文件被下载.当我第二次运行脚本时,我的函数按预期返回304,但删除了先前下载的文件.

fal*_*tru 8

boto.s3.key.Key.get_contents_to_filenamewb模式打开文件; 它在函数的开头截断文件(boto/s3/key.py).除此之外,它还会在引发异常时删除文件.

而不是get_contents_to_filename,您可以使用get_contents_to_file不同的开放模式.

def download(bucket, filename):
    key = Key(bucket, filename)
    headers = {}
    mode = 'wb'
    updating = False
    if os.path.isfile(filename):
        mode = 'r+b'
        updating = True
        print "File exists, adding If-Modified-Since header"
        modified_since = os.path.getmtime(filename)
        timestamp = datetime.datetime.utcfromtimestamp(modified_since)
        headers['If-Modified-Since'] = timestamp.strftime("%a, %d %b %Y %H:%M:%S GMT")
    try:
        with open(filename, mode) as f:
            key.get_contents_to_file(f, headers)
            f.truncate()
    except boto.exception.S3ResponseError as e:
        if not updating:
            # got an error and we are not updating an existing file
            # delete the file that was created due to mode = 'wb'
            os.remove(filename)
        return e.status
    return 200
Run Code Online (Sandbox Code Playgroud)

NOTE file.truncate用于处理新文件小于前一个文件的情况.

  • 作为旁注,当S3响应404时,无论如何都会创建本地文件. (2认同)