use*_*143 8 python zip amazon-s3
我有zip文件上传到S3.我想下载它们进行处理.我不需要永久存储它们,但我需要暂时处理它们.我该怎么做呢?
bri*_*ice 18
因为工作软件>综合文档:
import zipfile
import boto
import io
# Connect to s3
# This will need your s3 credentials to be set up
# with `aws configure` using the aws CLI.
#
# See: https://aws.amazon.com/cli/
conn = boto.s3.connect_s3()
# get hold of the bucket
bucket = conn.get_bucket("my_bucket_name")
# Get hold of a given file
key = boto.s3.key.Key(bucket)
key.key = "my_s3_object_key"
# Create an in-memory bytes IO buffer
with io.BytesIO() as b:
# Read the file into it
key.get_file(b)
# Reset the file pointer to the beginning
b.seek(0)
# Read the file as a zipfile and process the members
with zipfile.ZipFile(b, mode='r') as zipf:
for subfile in zipf.namelist():
do_stuff_with_subfile()
Run Code Online (Sandbox Code Playgroud)
import zipfile
import boto3
import io
# this is just to demo. real use should use the config
# environment variables or config file.
#
# See: http://boto3.readthedocs.org/en/latest/guide/configuration.html
session = boto3.session.Session(
aws_access_key_id="ACCESSKEY",
aws_secret_access_key="SECRETKEY"
)
s3 = session.resource("s3")
bucket = s3.Bucket('stackoverflow-brice-test')
obj = bucket.Object('smsspamcollection.zip')
with io.BytesIO(obj.get()["Body"].read()) as tf:
# rewind the file
tf.seek(0)
# Read the file as a zipfile and process the members
with zipfile.ZipFile(tf, mode='r') as zipf:
for subfile in zipf.namelist():
print(subfile)
Run Code Online (Sandbox Code Playgroud)
使用Python3在MacOSX上测试.
如果速度是一个问题,一个好的方法是选择一个距离您的 S3 存储桶(在同一区域)相当近的 EC2 实例,并使用该实例来解压缩/处理您的压缩文件。
这将减少延迟并允许您相当有效地处理它们。完成工作后,您可以删除每个提取的文件。
注意:只有当您可以正常使用 EC2 实例时,这才有效。