使用boto将公共URL上可用的图像上传到S3

dgh*_*dgh 31 python django amazon-s3 boto

我在Python Web环境中工作,我只需使用boto的key.set_contents_from_filename(path/to/file)将文件从文件系统上传到S3.但是,我想上传一张已在网络上的图片(比如https://pbs.twimg.com/media/A9h_htACIAAaCf6.jpg:large).

我应该以某种方式将图像下载到文件系统,然后像往常一样使用boto将其上传到S3,然后删除图像?

什么是理想的是,如果有一种方法可以获得boto的key.set_contents_from_file或其他一些接受URL的命令,并将图像很好地流式传输到S3,而不必明确地将文件副本下载到我的服务器.

def upload(url):
    try:
        conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
        bucket_name = settings.AWS_STORAGE_BUCKET_NAME
        bucket = conn.get_bucket(bucket_name)
        k = Key(bucket)
        k.key = "test"
        k.set_contents_from_file(url)
        k.make_public()
                return "Success?"
    except Exception, e:
            return e
Run Code Online (Sandbox Code Playgroud)

使用set_contents_from_file,如上所述,我得到一个"字符串对象没有属性'tell'"错误.将set_contents_from_filename与url一起使用,我得到No No file或目录错误.该博托存储文档的上传本地文件叶关闭,并没有提及上传远程存储的文件.

dgh*_*dgh 21

好吧,来自@garnaat,听起来不像S3目前允许通过网址上传.我设法通过仅将内容读入内存来将远程图像上传到S3.这有效.

def upload(url):
    try:
        conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
        bucket_name = settings.AWS_STORAGE_BUCKET_NAME
        bucket = conn.get_bucket(bucket_name)
        k = Key(bucket)
        k.key = url.split('/')[::-1][0]    # In my situation, ids at the end are unique
        file_object = urllib2.urlopen(url)           # 'Like' a file object
        fp = StringIO.StringIO(file_object.read())   # Wrap object    
        k.set_contents_from_file(fp)
        return "Success"
    except Exception, e:
        return e
Run Code Online (Sandbox Code Playgroud)

还要感谢我如何从urllib.urlopen()返回的"类文件对象"创建一个GzipFile实例?

  • 我不是百分百肯定,但我相信`url.split('/')[:: - 1] [0]`可以简单地重写为`url.split('/')[ - 1]`.我的意思是,我想不出任何结果会有所不同的情况. (4认同)

小智 12

对于这个问题的2017年相关答案,该问题使用官方的'boto3'包(而不是原始答案中的旧'boto'包):

Python 3.5

如果您正在进行干净的Python安装,请首先安装两个软件包:

pip install boto3

pip install requests

import boto3
import requests

# Uses the creds in ~/.aws/credentials
s3 = boto3.resource('s3')
bucket_name_to_upload_image_to = 'photos'
s3_image_filename = 'test_s3_image.png'
internet_image_url = 'https://docs.python.org/3.7/_static/py.png'


# Do this as a quick and easy check to make sure your S3 access is OK
for bucket in s3.buckets.all():
    if bucket.name == bucket_name_to_upload_image_to:
        print('Good to go. Found the bucket to upload the image into.')
        good_to_go = True

if not good_to_go:
    print('Not seeing your s3 bucket, might want to double check permissions in IAM')

# Given an Internet-accessible URL, download the image and upload it to S3,
# without needing to persist the image to disk locally
req_for_image = requests.get(internet_image_url, stream=True)
file_object_from_req = req_for_image.raw
req_data = file_object_from_req.read()

# Do the actual upload to s3
s3.Bucket(bucket_name_to_upload_image_to).put_object(Key=s3_image_filename, Body=req_data)
Run Code Online (Sandbox Code Playgroud)


gar*_*aat 7

不幸的是,真的没有办法做到这一点.至少现在不是.我们可以为boto添加一个方法set_contents_from_url,但是该方法仍然需要将文件下载到本地计算机然后上传它.它可能仍然是一个方便的方法,但它不会为您节省任何东西.

为了做你真正想做的事情,我们需要在S3服务本身上有一些功能,允许我们传递URL并让它为我们存储URL.这听起来像一个非常有用的功能.您可能希望将其发布到S3论坛.


bla*_*bul 7

这是我对请求的处理方式,关键是stream=True在最初发出请求时设置,并使用upload.fileobj()方法上传到s3 :

import requests
import boto3

url = "https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg"
r = requests.get(url, stream=True)

session = boto3.Session()
s3 = session.resource('s3')

bucket_name = 'your-bucket-name'
key = 'your-key-name' # key is the name of file on your bucket

bucket = s3.Bucket(bucket_name)
bucket.upload_fileobj(r.raw, key)
Run Code Online (Sandbox Code Playgroud)

  • @heartmo此处的讨论很好地概述了客户端,会话和资源之间的区别。/sf/ask/2996636751/ (2认同)

Fil*_*ale 5

一个简单的 3 行实现,适用于开箱即用的 lambda:

import boto3
import requests

s3_object = boto3.resource('s3').Object(bucket_name, object_key)

with requests.get(url, stream=True) as r:
    s3_object.put(Body=r.content)
Run Code Online (Sandbox Code Playgroud)

对于源.get一部分来自直requests文档